OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Syntax Sugar and XML information models

DTDs are missing from the InfoSet.
They are probably the most useful 'missing' item.

-Wayne Steele

>From: Michael Champion <mike.champion@softwareag-usa.com>
>To: xml-dev <xml-dev@lists.xml.org>
>Subject: RE: Syntax Sugar and XML information models
>Date: Wed, 28 Mar 2001 21:13:01 -0500
> >
> > > Conceptually, perhaps we have:
> > >
> > > The "Syntax Sugar InfoSet" (SSIS) that exposes everything worth
> > > round-tripping
> > > in the XML syntax... [even different quote characters
> > and whitespace???]
> >
> > That list could be endless - you did not even mention attribute order.
>Well, that's the nub of the issue here:  The W3C InfoSet is widely
>interpreted as decreeing that everything not in the InfoSet is "mere syntax
>sugar". Some of these distinctions are clearly rooted in the XML spec and
>existing practice, such as the fact that the order of attributes is
>insignificant, the type of quotation marks around attribute values is
>insignificant, etc.  Others are more controversial, such as CDATA sections.
>[For example, would you really want your XML database to take in XML
>documents with scripts escaped with CDATA sections and return them escaped
>with &lt; etc.?]
>Others really MUST be interpreted differently by authoring tools than the
>InfoSet specifies -- for example, the whole POINT of parsed entities is 
>if an editor doesn't round-trip them; likewise a database should either let
>its client resolve external entities, or resolve them at retrieval time
>rather than storage time.  (Entities are the only thing supported in a
>Recommendation that enable control of redundant information ...).
>So, there seem to be two classes of things that the InfoSet doesn't cover:
>the "mere syntax" that no reasonable application (except maybe a "diff")
>would care about, and the gray area stuff that some XML tools must care
>about but that the InfoSet says nothing about.  My suggestion is to make
>this distinction more
>formally, based on input from the folks "in the trenches" about which
>details of XML syntax are "significant" and which aren't.  Maybe there is 
>endless list of things that some people care about and some don't, but I'd
>at least like to see some discussion before giving up.
>So, does ANYBODY care about round-tripping a) the specific quote characters
>around attribute values, b) the order of attributes; c) character entity
>references for characters that are in the specified character set d) the 
>diferent syntaxes for empty elements, .... ?  Are there other bits that the
>InfoSet doesn't represent but have some practical significance to real
>applications? (Let's not discuss whitespace ... the complexities there are
>well-known and too painful to think about).

Get your FREE download of MSN Explorer at http://explorer.msn.com