OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Layering (was RE: Blueberry/Unicode/XML)

Title: RE: Layering (was RE: Blueberry/Unicode/XML)

> -----Original Message-----
> From: Rick Jelliffe [mailto:ricko@allette.com.au]
> Sent: Tuesday, July 10, 2001 11:44 AM
> To: xml-dev@lists.xml.org
> Subject: Re: Layering (was RE: Blueberry/Unicode/XML)
> For a 16-layer! characterization of XML and layering, see my article
> "Goldilocks and XML"  at
> http://www.xml.com/pub/a/1999/12/sml/goldilocks.html

That's a very interesting article ... I wonder, though, if it would be more useful to think about a smaller number of layers, combining those in which the order doesn't seem to matter:

0) Transport (e.g., database retrieval or network receipt, decompression, etc.).  I don't think that XML per se has much to say about this ... certainly XQuery and SOAP do.  I've always thought of an XML processor as being handed a stream or buffer of text from somewhere, the XML processor shouldn't much care where.  

1) Encoding / Normalization -- an XML processor would have to deal with these issues, but would ideally look to the Unicode spec for guidance (with the 20-25 exceptions discussed in the Blueberry thread noted, of course, in XML).

2) Preprocessing -- Basically, the various bits of "syntax sugar" need to be distilled into a more pure form: CDATA marked sections cleaned up, entity references expanded ... some of us would like to see comments stripped here but that's a problem as long as they stay in the InfoSet.

3) Parsing -- The preprocessor conceptually emits something very much like Common XML Core (except that PIs would have to be passed through); the parser itself can be very simple and just deal with the 15 or so productions needed to define elements, attributes, and text.

4) Infoset Post-processing -- There is much discussion within the W3C as to the order that things should happen here, but one could think of it as one blob of processes where default attribute values are added to the InfoSet instance, Namespace URIs are associated with whatever nodes need them and/or prefixes normalized [this is a conceptual mess right now; different W3C specs have different conceptions of namespace prefixes and URIs], validation occurs and the PSVI infoset generated.

5) Application-level processing -- PI's processed however the application is going to process them; ID/IDREF links handled however the the app needs to handle them, etc.  I don't know whether to put XLink processing here or in Level 4.

The reminder about the OSI conceptual model is helpful ... This is probably more a conceptual model than a prescription for an actual implementation; there are probably good design reasons for combining steps in actual code.  Nevertheless, layering XML with explicit interfaces at each of these levels would let those of us who think of care about legacy encodings and other bits of I18N complexity come in at Level 1; those who care about comments, CDATA sections, and other bits of "noise" can come in at Level 2; the "simpletons" can come in at Level 3 and stay there; and the denizens of the bleeding edge can do their thing at Level 4. How different products choose to expose the interfaces and implement their actual code is up to the designers.

I guess it opens up the "fragmentation" can of worms, but as a number of us have argued many times, we already have plenty of fragmentation along the axes of validating/non-validating, entity-expanding/non-entity-expanding, namespace-aware/non-namespace aware, schema-aware/non-schema-aware ... some explicit layering scheme gives us a way to manage the fragmentation, not deny it exists.