OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] The subsetting has begun

[ Lists Home | Date Index | Thread Index ]

Hash: SHA1

/ ari@cogsci.ed.ac.uk (K. Ari Krupnikov) was heard to say:
| Elliotte Rusty Harold <elharo@metalab.unc.edu> writes:
|> I suspect part of the problem is that the members of the expert group
|> did not have a clear understanding of the difference between
|> validation and reading the DTD, between the DTD and the document type
|> declaration, and between the internal and external DTD subsets.
|> These are common areas of confusion for a lot of developers. However,
|> if you're going to write specs, you need to understand such matters
|> better than the average developer.
| It would be interesting to hear what people like Norman Walsh think
| about it.

Gulp. I'm not sure on what precisely you want my opinion. For the
record, I work for Sun (someone replied privately to a message I sent
a few weeks ago suggesting that "my cover was blown" when in fact I
had no intentions of a cover at all. I've just been subscribed to this
list longer than I've worked for Sun :-).

I've spoken to the folks working on JSR 172 and I think they
understand the distinctions to which Elliotte Rusty Harold alludes.
They're building a SOAP processor for devices with a code footprint of
something like 25kb. (*kilo*bytes). I think there's room for their
spec to be clearer about the decisions they've made, why they've made
them, and the ways in which the API they're exposing is intended to be
used. And I think they're going to make those changes.

I think it's more valuable to look at the broader issues here.

As it happens, I'm giving a presentation for the TAG on the
xmlProfiles-29 issue on Wednesday at the technical plenary. My rough
draft slides are in a public space[1], so feel free to peek at them.
But I may change them before Wednesday.

As far as I can see, the following statements are true:

1. People will subset XML. They already have.

2. Developers will write code that only processes those subsets.

3. The result will be reduced interoperability if developers think
   that they can use that code for general-purpose XML processing.

It looks to me like the single biggest hunk-o-stuff that people want
to get rid of in subsets is the DTD processing. I can even imagine a
world in the distant future where schema processors are widespread,
well-understood, and fast enough that documents don't often have
document type declarations. That's a world in which we all might
benefit from smaller parsers.

So when I first started thinking about this issue, I thought that it
might make sense to define a single new subset of XML. Basically, XML
1.1 without DTDs. I even wrote a spec for it:

  1 Introduction

  Extensible Markup Language Kernel, abbreviated XMLK, describes a
  subset of the class of data objects called XML documents defined by
  [XML], as amended by [XML 1.1].

  The design goals for XMLK are:

   1. XMLK documents shall be backwards compatible with XML 1.1.
   2. XMLK documents shall be standalone.

  This specification, together with [XML 1.1], provides all the
  information necessary to understand XMLK Version 1.0 and construct
  computer programs to process it. 2 Definition

  XMLK 1.0 is identical to XML 1.1 with the following single,
  normative change. Production 22 is replaced with:

  [22] prolog ::= XMLDecl? Misc? [WFC: Document Type Declaration]

  Well-formedness constraint: Document Type Declaration

    A document type declaration must not occur. XMLK documents cannot
    contain an internal or external subset.

  With this change, a number of validity and well-formedness
  constraints are trivially satisfied, but they hold nonetheless.

As time has passed and there's been more pushback against the idea of
a new subset, my conviction has waivered.

Perhaps the right answer is simply to say that a processor for the
subset of XML defined by "foo" should be called a "foo processor" and
not an XML processor.

The argument that "foo" isn't XML probably isn't very interesting from
a purely practical standpoint. But maybe we can get everyone to agree
to call a spade a spade.

                                        Be seeing you,

[1] http://www.w3.org/2003/03/05/tag/xmlProfiles-29/

- -- 
Norman.Walsh@Sun.COM    | To the man who is afraid everything
XML Standards Architect | rustles.--Sophocles
Web Tech. and Standards |
Sun Microsystems, Inc.  | 
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS