Lists Home |
Date Index |
- From: Sean McGrath <firstname.lastname@example.org>
- To: Elliotte Rusty Harold <email@example.com>, firstname.lastname@example.org
- Date: Thu, 03 Aug 2000 19:15:51 +0100
>>This is the sort of "partical physics" I think we need
>>beneath XML 1.
[Elliotte Rusty Harold]
>But there is a particle physics beneath the InfoSet that applications
>can use if they like. It's called the stream. The particles are
>bytes. That may seem a little too fundamental to you, and you may
>want something a little higher level. OK. But all we're doing here is
>arguing about which layers of abstraction are useful.
The W3C infoset work seems to be to bless two levels
a) XML entities are a stream of bytes
b) XML entities consist of elements,attributes,data ...
(all the stuff in the Infoset doc)
I see these two as being on opposite sides of a spectrum.
I see two other interesting foci on that spectrum:
bytes tokens infoset uber-infoset
(a) ------(X)------------- (b)--------(Y)
(a) is comprehensive but working at this level involves
parsing XML constructes from scratch. This is a lot of
work as anyone who has ever written an XML parser will
(b) is convenient for a broad class of applications but lossy.
Certain stuff is not visisble. The stuff that is not visible
is lost if the application round-trips back to XML.
(X) This is the space where what SGML called
"markup sensitive" apps. live. Apps that care about the
difference between "Hello world" and "&greeting;".
Apps that care about default attribute values etc. etc.
(Y) This is the space where high fidelity roundtripping apps
live. Apps that care about the difference between:
<name first = "Sean" last = "Mc Grath"/>
last = 'Mc Grath'
first = 'Sean'></name>
(b) which is where the W3C infoset lives. It seems to me to
be closest to what SGML called "structure controlled" apps.
I am worried that by blessing a single infoset, the W3C are
leaving big holes in areas (x) and (y) where a lot of
important XML data processing goes on.
There needs to be N infosets (N > 1) to cover
the range of application types people build with
How that comes to pass remains to be seen. For now,
I would be delighted if the W3C simply *renamed* the
infoset to be something more familial like the "structure
controlled XML infoset" so that it is obvious to readers,
where in the spectrum of possible XML infosets in lives.
http://www.pyxie.org - an Open Source XML Processing library for Python