Lists Home |
Date Index |
10/31/2002 7:36:04 AM, Elliotte Rusty Harold <email@example.com> wrote:
A good topic for Samhain (aka Halloween), when the boundary between
the worlds of -- of life and death, syntax and data models -- becomes blurred :-)
>It's the infoset's fault that it doesn't mandate simple
>well-formedness. I have no objection to synthetic infosets or
>non-text, internal representations of the Infoset such a DOM Document
>object. I object when those representations do not adhere to the same
>basic rules XML 1.0 does.
Uhh, "This specification defines an abstract data set called the
XML Information Set (Infoset). Its purpose is to provide a
consistent set of definitions for use in other specifications
that need to refer to the information in a well- formed XML document."
I fully agree that it would be nice for Someone (I despair of this
being the W3C) to formally describe the implicit data model in
XML. The trouble is, some people deny that it exists, and most people who
try to take a stab at this hit quicksand quickly. ("Can there
be adjacent Text nodes? What about CData sections, unexpanded
entity references, and other syntax sugar?) Then there are Namespaces,
whose Giant Sucking Sound scares away all but the bravest
explorers of this space. Then there's the "PSVI" stuff (even XML 1.0
constructs such as attribute types and default attribute values
arguably are part of the PSVI). Not to mention XInclude and the
lack of a common processing model saying when it is applied!
Since everyone who looks at this comes up with a different answer,
there's no answer that will satisfy everyone (one reason the Infoset
spec is so, uhh, non-directive I believe).
I personally (taking all my hats off!!) think that a single data model ought
to be described and what we call "XML" redefined on top of that single
data model. Syntax sugar is fine, but it probably ought to be resolved in
a pre-parser akin to the C preprocessor that produces a canonical
syntax that could be the basis for true interoperability at the syntax
level. Parsers (of this canonical syntax or of any number of "little languages"
and alternate syntaxes) that produce data structures that logically conform to the
single data model could be considered to be "XML", and processed with
XSLT, queried with XQuery, passed around via SOAP, etc.
All this is not going to happen until the cruft overwhelms us, and
so far people have dealt with the cruft by ad hoc profiles (e.g., the
one SOAP uses) and implicit agreement on what the specs really mean.
We shall see if that suffices in the long run.