OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: RE: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)

[ Lists Home | Date Index | Thread Index ]

10/8/2002 9:31:23 AM, "Michael Kay" <michael.h.kay@ntlworld.com> wrote:

 have an underlying data model. 
>The problem is that it should have an underlying model, but it hasn't:
>it only has a "overlying" model (the InfoSet) that is retrofitted to the
>syntax. The fact that the model is retrofitted rather than being a
>normative part of XML means that questions like "are comments
>significant" have never been satisfactorily answered. Even the new
>versions of the specs (XML 1.1 and Namespaces 1.1) do not refer
>normatively to the InfoSet, so these questions remain debateable. And
>the confusion over marginally-significant stuff like CDATA sections,
>namespace prefixes, and inter-element whitespace continues to cause
>interoperability nightmares. If people had defined the model before
>defining the syntax we wouldn't be in this mess.

I completely agree.  The DOM implicit data model tries to be 
inclusive in exposing "syntax sugar" because it was driven by
the requirements of editor vendors who need to expose that level
of control. "Overlying" data model is a good description of the
infoset, which is designed more to describe what parsers
produce than to prescribe what syntax should be significant.  The
XPath/XSLT data model was the first to start the job of triage on
syntax sugar that should dissolve when parsed, and since its
data model is read-only, it doesn't have to worry about round-tripping
the way the DOM data model does.  This is a mess, indeed.  

I think, however, that the reason we are in this mess is there is a 
"heritage" in SGML, carried over in SAX, and now in LMNL, that
markup really is Just Syntax, and data models are something for the application
to define.  That's not a problem per se -- obviously lots of people get
real work done in that paradigm -- just that it doesn't fit into the
world of Dynamic HTML scripters, generic XML authoring tools, generic
XML transformation languages, generic XML DBMS systems, etc.  A DBMS
has to take a stand on whether entities are expanded or undexpanded before
indexing; it has to decide whether to preserve CDATA sections and comments,
etc.  So, I can agree that "if people had defined the model before delivering
the syntax" then WE (the generic data model-oriented subculture) wouldn't be
in this mess, but then the "it's just syntax" people wouldn't have come along
on the XML parade.  


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS