OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Why the Infoset?

[ Lists Home | Date Index | Thread Index ]
  • From: Sean McGrath <sean@digitome.com>
  • To: Elliotte Rusty Harold <elharo@metalab.unc.edu>, xml-dev@lists.xml.org
  • Date: Thu, 03 Aug 2000 19:15:51 +0100

>>This is the sort of "partical physics" I think we need
>>beneath XML 1.
[Elliotte Rusty Harold]
>But there is a particle physics beneath the InfoSet that applications 
>can use if they like. It's called the stream. The particles are 
>bytes.  That may seem a little too fundamental to you, and you may 
>want something a little higher level. OK. But all we're doing here is 
>arguing about which layers of abstraction are useful.

The W3C infoset work seems to be to bless two levels
of abstraction:
	a) XML entities are a stream of bytes
       b) XML entities consist of elements,attributes,data ...
       (all the stuff in the Infoset doc)

I see these two as being on opposite sides of a spectrum.
I see two other interesting foci on that spectrum:

      bytes    tokens           infoset  uber-infoset
	(a) ------(X)------------- (b)--------(Y)

(a) is comprehensive but working at this level involves
parsing XML constructes from scratch. This is a lot of
work as anyone who has ever written an XML parser will
tell you.

(b) is convenient for a broad class of applications but lossy.
Certain stuff is not visisble. The stuff that is not visible
is lost if the application round-trips back to XML.

(X) This is the space where what SGML called
"markup sensitive" apps. live. Apps that care about the
difference between "Hello world" and "&greeting;".
Apps that care about default attribute values etc. etc.

(Y) This is the space where high fidelity roundtripping apps
live. Apps that care about the difference between:
	<name first = "Sean" last = "Mc Grath"/>
		last = 'Mc Grath'
		first = 'Sean'></name>

(b) which is where the W3C infoset lives. It seems to me to
be closest to what SGML called "structure controlled" apps.

I am worried that by blessing a single infoset, the W3C are
leaving big holes in areas (x) and (y) where a lot of
important XML data processing goes on.

There needs to be N infosets (N > 1) to cover
the range of application types people build with

How that comes to pass remains to be seen. For now,
I would be delighted if the W3C simply *renamed* the
infoset to be something more familial like the "structure
controlled XML infoset" so that it is obvious to readers,
where in the spectrum of possible XML infosets in lives. 


http://www.pyxie.org - an Open Source XML Processing library for Python


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS