> -----Original Message-----
>
From: John Aldridge [mailto:john.aldridge@informatix.co.uk]
> Sent: Thursday, March 29, 2001 5:35 AM
> To:
xml-dev
> Subject: RE: Syntax Sugar and XML information
models
> (b) Editors all write a
_standard_ normal form (i.e. not just
> a normal form of their own
choosing)
This is more or less what I was hoping we could collectively
define, and
"standard normal form" sounds a lot better than "Syntax Sugar
Information
Set." And to answer Rick Jelliffe's question, I agree that
the W3C InfoSet is
a reasonable model for what people care about when
navigating or transforming
a document, but we need a richer model for editors
and databases. These are two
halves of the same coin, since a database must
round-trip whatever is significant
to an editor, and an editor must preserve
whatever is significant to a database).
BUT I'm not sure I agree "that means you are *not* interested in the information set
of the document, but the actual text of the document's entities.
That is a fine thing. Let there be element-based (infoset) editors and
entity-based (tag-aware) editors". Databases (and arguably editors)
*should* be interested in the information set of a document rather than just the
bytes that make it up, but they need a richer information set than the W3C
InfoSet.
I'm hoping to find a middle ground between "editors and
databases must simply
round-trip the (core) infoset" and "editors and
databases must round-trip every
single character". My first cut at this
is that the "standard normal form" is
Canonical XML + external entity
references + CDATA sections ...
I'm sure there is more.
As for the order of
attributes, doesn't XML 1.0 specifically declare this to
be
insignificant?