OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Why the Infoset?

[ Lists Home | Date Index | Thread Index ]
  • From: "Paul W. Abrahams" <abrahams@valinet.com>
  • To: XMLDev list <xml-dev@lists.xml.org>
  • Date: Fri, 28 Jul 2000 20:16:26 -0400

Earlier I wrote:

> What is the purpose of the XML Infoset?  Is it mainly
> intended to enlighten implementors about what the abstract
> structure of an XML document is, or does it have some other
> less obvious uses?   Are there other XML specs that refer to
> it in normative contexts, i.e., that would be ill-defined
> without the Infoset?   The XPath spec refers to it in a
> non-normative context but that doesn't count.

Len Bullard responded:

> IMO, it answers the question "what is a node?" in XML.
> This enables the information items to
> be independent of implementation and otherwise,
> reliable addressing without requiring "a specific
> interface or class of interfaces" (http://www.w3.org/TR/xml-infoset).
> Without it, a clear description of the
> properties required for well-formedness is
> difficult.  So yes, the abstract data set.

Seems to me that a "node" is a creature of the Infoset itself; without the
Infoset, one wouldn't even be asking the question (at least with respect to the
textual form of XML, which is the form that most of the world sees).

The statement "this enables the information items to be independent of
implementation" describes what the Infoset does.  But one could also say of the
textual form that it defines a syntax that's independent of implementation.  In
general, input is independent of implementation; it's output that depends on

And doesn't the XML spec itself define well-formedness satisfactorily?

Steve Rowe responded:

> >From section 1.1 of the XInclude 17-July-2000 Working Draft [1]:
>    [XInclude] defines a specific processing model for merging
>    information sets.
> >From section 1.2:
>    XInclude operates on information sets and thus is orthogonal
>    to parsing.

Couldn't the effect of

  <myinclude xinclude:href="something.xml"/>

have been described in terms of XML textual forms rather than in terms of the
Infoset?  In other words, was the choice of the Infoset as the descriptive
mechanism logically necessary or just more convenient?

Darrin Bishop responded privately:

> The XML 1.0 spec is the rules to persist a logical document using a
> hierarchical tag structure. The true info is the logical document. How the
> logical document "looks" is based on the infoset.  True, XML 1.0 spec came
> way first and the infoset was created after the fact based on what the
> logical doc looked like before being persisted to stream or text.  The
> infoset lists the relationships between information items (i.e. elements and
> attributes).  The spec does not say anything about persisting itself. If I
> remember  correctly, it even states that <?xml version='1.0'> is not
> included in the infoset, that is a serialization issue.
> Once you define what is contained and the relations in the infoset, you can
> create new specs off that and not be tied to the XML 1.0 spec. For example
> you could create a new navigation spec different from XPath and if it
> conforms to the infoset spec, it should be usable no matter what
> serialization protocol is used.

Which is the horse and which the cart here?  Especially given its ancestry as a
more civilized form of SGML, XML is seen by the world as a set of textual
conventions for recording documents.  The Infoset is related to an
implementer's view of the abstract syntax tree.  But even then, I believe that
people were writing XML parsers, and therefore creating abstract syntax trees,
before the Infoset ever existed.

Looking at it another way, how would the XML world be poorer if the Infoset did
not exist?

Jonathan Borden also responded privately:

> XML is a serialization of a logical document structure defined by the XML
> Infoset. The Infoset uses the DOM as an API. If an XML document is defined
> by the character stream, the document is also defined by the SAX event
> stream (which may result from parsing the XML document, but also may result
> from another event source).
> In order to map MIME onto XML, I've used XMTP
> (http://www.openhealth.org/documents/xmtp.htm). At first it may appear that
> this is useful only for *actually* converting MIME text streams into XML
> text streams. Not so. Using the XML Infoset, XMTP also defines the
> generation of a series of SAX events as the result of parsing a MIME
> document, or a DOM interface that results from parsing a MIME document. In
> this case one might generate an (X)HTML page from an e-mail message using
> XSLT ***without actually ever generating an intermediate XML stream***
> Similarly one can interpret a directory structure as an XML Infoset (the XML
> uberdocument approach).

Again, I must ask about the horse and the cart.  If the Infoset had been
created before XML, then this view would be more understandable to me -- but
that's not the history.  In your use of XMTP to bypass the textual form of XML,
you don't really need the Infoset, especially since the Infoset itself allows
for customizations as to what is core and what is peripheral.   The Infoset, by
its own statement, doesn't require or favor a specific interface or class of
interfaces such as a tree structure.  So for XMTP you have to define some
specific interface, and that's what you'd be doing without the Infoset, I'd

Viewed as an elegant description of the information contained in an XML
document, the Infoset make sense.  But unlike the other XML specs, its
normative effect is unclear.  If I'm implementing an XML-related processor of
any variety, what does the Infoset require me to do that I would not have to do
if the Infoset never existed?

Paul Abrahams


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS