Lists Home |
Date Index |
> -----Original Message-----
> From: Elliotte Rusty Harold [mailto:email@example.com]
> Sent: Monday, April 12, 2004 19:06
> To: firstname.lastname@example.org
> Cc: 'Michael Champion'; ''xml-dev' DEV'
> Subject: RE: [xml-dev] XML Binary Characterization WG public
> list available
> At 4:59 PM -0400 4/12/04, Bob Wyman wrote:
> >Elliot Rusty Harold wrote:
> >> DOM and SAX at least can handle documents that do
> >> not have infosets, and I think XPath/XSLT can too
> > What does it mean to say that an XML document "does not have an
> >infoset"? I'm a bit puzzled by the words here. I thought
> that any XML
> >document could be described via an Infoset. What am I missing? (My
> >apologies if this is a silly question or if I've missed
> something very
> >fundamental in the XML specs...)
> They're at least two such cases that exist, as John Cowan
> pointed out before:
> 1. Documents that are namespace malformed
> 2. Namespace well-formed documents that use relative namespace URIs
> I've encountered both of these in practice. There may be other cases,
> but these (or more specifically case 1) were what I was thinking of
> The problem in the reverse direction is much larger though. There are
> many, many infosets that do not correspond to well-formed XML 1.0
> documents. This has been a major problem for various specifications
> based on the Infoset including XInclude. By starting with the
> infoset, rather than XML itself, technologies tend to lose track of
> some critical rules like "element names may not contain white space"
> or "an element may not have two attributes with the same name." These
> rules are not enforced in the infoset, only in real XML.
The view taken in developing the "fast infoset" standard in ISO/ITU-T is to
consider the subset of infosets that have an XML representation and the
subset of XML documents that have an infoset. Further simplifications are
done, therefore the actual subset is smaller than that. The objective is to
cover a very large number of infosets / XML documents that are of practical
interest, not to cover all possible XML documents and all possible infosets.
(Of course, some people will have a different opinion on what XML documents
are of practical interest.)
By doing so, we will have an alternative representation of an XML document,
that is both more compact and faster to parse and create, but that can be
easily converted back and forth from/to XML. For all the XML documents that
are in the "subset" mentioned above, conversion is lossless, character-wise,
except for such things as whitespace inside tags and quote-apostrophe stuff.
By the way, we are not calling this thing "XML something" or "binary XML".
We are calling it "fast infoset".
The specification is written in terms of the infoset, by providing an ASN.1
definition for each information item and item property, with some
simplifications (as mentioned above).
I hope you will agree that such a thing can be useful to many people,
although it is not XML, of course.
> The mapping between infosets and XML documents is neither 1-1 nor
> onto, even when lexical issues like white space inside tags is
> Remember, despite what people keep saying the infoset is *NOT* a data
> model for XML. It is *NOT* a replacement for the XPath data model,
> the DOM data model, or any other data model.
> Elliotte Rusty Harold
> Effective XML (Addison-Wesley, 2003)
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an initiative
of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription