OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Why the Infoset?

[ Lists Home | Date Index | Thread Index ]
  • From: Rick JELLIFFE <ricko@geotempo.com>
  • To: ",XML DEV" <xml-dev@lists.xml.org>
  • Date: Sat, 29 Jul 2000 19:50:45 +0800

"W. E. Perry" wrote:
> "Paul W. Abrahams" wrote:
> > Looking at it another way, how would the XML world be poorer if the Infoset did
> > not exist?
> It is far worse than that, I fear. The Infoset is the cuckoo's egg in the XML nest.
> The fundamental innovation of XML 1.0 was the concept of well-formedness, which as
> a radical insight amounts to this: the instance text--that is, content plus
> markup--is entirely self-sufficient both as syntax and as the basis for derived or
> elaborated semantics.

I disagree. The basis of SGML'86 was that the rooted, directed, cyclic
graph with 
attribute-value tree framework that allowed a handful of general
on the edges (child, parent, next, attribute, IDREF, etc) and a handful
of general types
on the named nodes (element, comment, PI, etc) was, when coupled with a
simultaneous rooted
graph of entities with a handful of general types (NDATA, sgml), was
sufficient for
an enormous number of complex problems.  On top of this information
model, the
need to cope with an enormous number of possible notations and syntaxes.

XML's WF-only is not "entirely" self-sufficient for anything to do with

(The problem is only fixed by hard-coding, i.e. XLink, and hardcoding
requires universal
names, i.e. Namespaces using some public-identifer-like registration
system, i.e. URIs)
What XML did was to say that a lot of users only need simple AV-trees,
so lets allow
them to have them with little fuss. And it said that people could agree
on a syntax.

> Since no DTD nor other content
> model or pre-ordained schema is required for the parsing, and therefore the
> interpretation, of the resulting instance document, it is not necessary to secure
> anyone's agreement to the extension of the content model before simply extending
> the markup vocabulary of the instance document. 

I think this is too hard on schemas: what a schema does, in part, is
specify which
additional constraints the document has more than WF XML. These
constraints allow
more optimal handling of data: if I know that my content model is
(a,b,c)+ and 
that it is closed, then I can allocate a list with three slots 
for them and I know that the XPath  a[position()=1]  on the parent will
always succeed.
If I know that an element is a date, I can store it in a database as a
date not a
string. If I know that a value or combinations of value is unique, I can
use them
as keys for faster access to data. 

The idea of a syntax with no schema/DTD is hardly new: in part, it was
the infelicities
of these that caused SGML'86 to take such a strong and radical view: if
anyone can put any element
anywhere, how can a consumer contractually require an information
producer to produce
certain information?  LISP or ADA or any of the languages with
and nameable parameters provides the same basic capabilities as XML WF:
why are they not
good markup languages?   It is the ability to constrain data by schemas
that is the 

If data were all atomic, and each datum was described by a universal
name, and no two documents 
were similar, then I think
William's view would be pretty close to the mark: documents could be ad
hoc assemblages
of elements used by applications which handled each document as best it
could, perhaps
with the aid of private schemas to check that all the information
required was present.
But truth is not atomic: a number may be complex, a quantity will have a
unit as well as
a value, a table has rows, love and marriage go together like a horse
and carriage.

The other problem with William's view is the idea that documents don't
exist in fairly
similar runs: document types.  A schema is a way for the generator of
the document to communicate to the consumer of the document to tell them
the rules they have used.   A good schema 
language can allow the consumer of the document to know such things as
 * "If I delete element X here, should I expect all other systems who
are in the loop to still process the document correctly?"
 * "If I add element Y here, will that break other people's systems?"
 * "Do I really need to check that condition Z holds at this point, or
can I trust the generic contract-checking system?"

Rick Jelliffe


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS