OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Why the Infoset?

[ Lists Home | Date Index | Thread Index ]
  • From: Eric Bohlman <ebohlman@netcom.com>
  • To: "Simon St.Laurent" <simonstl@simonstl.com>
  • Date: Tue, 01 Aug 2000 13:26:24 -0700 (PDT)

On Tue, 1 Aug 2000, Simon St.Laurent wrote:

> At 02:14 AM 8/2/00 +0800, Rick JELLIFFE wrote:
> >Can you give an example?  By XML processing application, do you mean "an
> >application that processes XML-encoded text" or "an application that
> >processes the results of XML-parsing an instance"?  The infoset only
> >needs cover the latter.
> Could you explain why you are so convinced that "The infoset only needs
> cover the latter"?
> I'm getting kind of dizzy here.  You've objected rather violently to Common
> XML and Minimal XML's subsetting of XML syntax, but you seem to insist on
> the Infoset only providing an abstraction of just such a subset,
> deliberately ignoring the rest.
> The 'usual SML suspects', Sean and myself, both seem to be arguing that the
> Infoset should be as inclusive as possible if it claims to represent XML 1.0.

AIUI, what Rick is saying is that in order for the infoset concept to be
useful, the relationship between syntactic instances of XML and their
infosets needs to be many-to-one, not one-to-one.  Otherwise the infoset
is merely a restatement of the sequence of individual characters in the
original instance, and there's no reason to do so, as the original
characters would serve the purpose just as well.

The whole point is that the infoset does *not* limit the *syntax* of XML
documents; rather it specifies what variations in syntax are "significant"
and what aren't.  Insignificant variations in syntax (such as the use of a
character reference rather than a literal character, or different orders
of attributes) map many-to-one into single infoset contributions.  This
isn't the same goal things like SML and Common XML have, which is to
constrain the range of syntactically-equivalent forms.

Of course, there are always going to be certain applications that really
have to work with the lexical details of the syntactic instance rather
than its infoset; these are editor-type applications that need to preserve
aspects of the lexical (physical) structure of the original document.  For
example, if you have a book organized into a single "wrapper" containing
external entity references for each chapter, and you want to run it
through an application that inserts bibliographic information looked up
from <citation> elements in the source, you would *not* want the results
of the transformation to consist of one giant text file.  But those will
be a minority of the applications that process XML.  The whole idea of the
infoset is to enable parsers to insulate applications from such details.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS