OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] lots of WS reading material

[ Lists Home | Date Index | Thread Index ]

At 08:10 26/04/2002 -0400, W. E. Perry wrote:

>John Cowan wrote:
> > Sure it does.  The Infoset is ridiculously close to the XML surface; it
> > just abstracts away crap like "How many spaces between attributes?" and
> > "What kind of quotation mark?" and the like.  About the only thing that
> > disappears without a trace is the physical entity structure.


>Nevertheless, in philology
>the pivotal discovery of the twentieth century was of the primacy of

I've seen this debate go round a few times now, and I confess to not having 
understood it yet, probably because I don't have the background in 
philosophy to understand the terms being used.

In so far as I /do/ understand, it seems that some believe that what XML is 
really about is a logical data model consisting of elements, attributes and 
suchlike (the infoset's Information Items). The concrete XML syntax is a 
language which can be used to represent that logical data model.

The others believe that the concrete character-by-character syntax of the 
XML is all that matters, and that trying to pretend that it is a 
representation of some more abstract data model is bound to result in 
discarding semantic information which was important to the author.

I'm assuming that this latter camp, for example, believes that section 
3.3.3 of the XML Rec, which starts

     Before the value of an attribute is passed to the application
     or checked for validity, the XML processor must normalize
     the attribute value by applying the algorithm below...

is exactly such a piece of poor design, and that the validation algorithm 
ought to have been specified to work on the attribute value as written. If 
this assumption is wrong, I'd like an explanation -- it seems to me that 
this mandatory normalization is exactly equivalent to an infoset-style 

I guess we all agree that some degree of abstraction from the underlying 
representation is desirable?  No-one cares that my XML data has actually 
been split into 512 byte chunks for storage in some physical filing system.

/Are/ we just disagreeing about the amount of abstraction which is 
desirable, or is there some difference of kind between the good 
abstractions and the bad ones?

For what it's worth, I find myself in the pro-infoset camp. One of the 
strengths of XML is that it allows me to compose specific applications out 
of general purpose tools (e.g. SAX, XSLT). In the absence of some notion of 
the logical data model represented by XML, these general purpose tools are 
not going to be composable in this manner.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS