OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Why the Infoset?

[ Lists Home | Date Index | Thread Index ]
  • From: Michael Champion <Mike.Champion@softwareag-usa.com>
  • To: XMLDev list <xml-dev@lists.xml.org>
  • Date: Fri, 28 Jul 2000 21:28:52 -0400

----- Original Message -----
From: "Paul W. Abrahams" <abrahams@valinet.com>
To: "XMLDev list" <xml-dev@lists.xml.org>
Sent: Friday, July 28, 2000 8:16 PM
Subject: Re: Why the Infoset?
> Viewed as an elegant description of the information contained in an XML
> document, the Infoset make sense.  But unlike the other XML specs, its
> normative effect is unclear.  If I'm implementing an XML-related processor
> any variety, what does the Infoset require me to do that I would not have
to do
> if the Infoset never existed?

It answers questions that are irrelevant when XML is viewed as a syntax, but
quite important to users of the DOM, XPath, XSL, etc. that operate on some
representation of a more abstract parsed XML document.  For example, the XML
spec says that "<empty></empty>" and "<empty/>" are both well formed XML
elements, but nothing about whether they are equivalent.  Infoset says (or
at least the previous draft did) that they are.  Likewise, as was pointed
out earlier, InfoSet says that certain well-formed XML elements such as
"<ns::foo>blah</ns::foo>" do NOT have an unambiguous internal
representation. Without the InfoSet, it would be unclear if this is an
element named "foo" with a namespace prefix "ns", an element "foo" with a
prefix "ns:", or an element named "ns::foo". [OK, so shoot me if I've got a
detail wrong here ... I'm trying to illustrate the general point ;~) ]

The lack of an InfoSet certainly made it much harder to invent the Level 1
DOM; it simply was not clear (and was highly contentious) whether expanded
entity references remained in the XML document tree or not... and how mixed
content would be represented in the tree.

Once the DOM and XPath were invented, subtle differences emerged in their
conceptions of what an abstract representation of an XML document looks like
... and there's always the "groves" model that underlies the HyTime and
DSSSL specs that provides yet another perspective on what an abstract SGML
document looks like.  While I personally fear that InfoSet [again, previous
drafts anyway] papers over these differences rather than clearly specifies a
single model, it definitely provides a much clearer notion of what an XML
parser produces, and what an XML API or transformer operates on, than would
exist in its absence.

So, one fairly practical normative question it *does* answer would be: 'My
application would like to treat "<emtpy></empty>" as signifying "data will
the value NULL" and "<empty/>" as signifying "no data".  Can I do this in a
environment where the XML will be processed by various tools that implement
the XML specs but that I do not control?'  The answer, for better or worse,
is NO - an XML processor is under no obligation to preserve this
distinction.  That answer comes from the InfoSet ... not the XML spec, the
DOM, XSLT, etc.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS