> -----Original Message-----
> From: John Aldridge [mailto:email@example.com]
> Sent: Thursday, March 29, 2001 5:35 AM
> To: xml-dev
> Subject: RE: Syntax Sugar and XML information models
> (b) Editors all write a _standard_ normal form (i.e. not just
> a normal form of their own choosing)
This is more or less what I was hoping we could collectively define, and
"standard normal form" sounds a lot better than "Syntax Sugar Information
Set." And to answer Rick Jelliffe's question, I agree that the W3C InfoSet is
a reasonable model for what people care about when navigating or transforming
a document, but we need a richer model for editors and databases. These are two
halves of the same coin, since a database must round-trip whatever is significant
to an editor, and an editor must preserve whatever is significant to a database).
BUT I'm not sure I agree "that means you are *not* interested in the information set of the document, but the actual text of the document's entities. That is a fine thing. Let there be element-based (infoset) editors and entity-based (tag-aware) editors". Databases (and arguably editors) *should* be interested in the information set of a document rather than just the bytes that make it up, but they need a richer information set than the W3C InfoSet.
I'm hoping to find a middle ground between "editors and databases must simply
round-trip the (core) infoset" and "editors and databases must round-trip every
single character". My first cut at this is that the "standard normal form" is
Canonical XML + external entity references + CDATA sections ...
I'm sure there is more.
As for the order of attributes, doesn't XML 1.0 specifically declare this to be