OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Still not the essence of XML (was Re: [xml-dev] S-expressions vs. XML)

[ Lists Home | Date Index | Thread Index ]

From: "Alaric Snell" <alaric@alaric-snell.com>

> One other point: Don't confuse LISP and s-exprs, as a few posts I've just 
> seen on this kind of do.
> s-exprs are a way of writing information, kind of like XML.
> LISP is a language based around an s-expr data model that happens to use 
> s-exprs also for its written syntax, kind of like XSLT.
And similarly, don't confuse XML with its infoset, let alone the PSVI. It is the infoset or PSVI or canonical XML that are (paradoxically) closest to S-exprs (if we accept properties as part of S-exprs).  S-exprs as such have no equivalent to the XML encoding PI  or entities.  Without a convention for unambiguously labelling the encoding of the print form and for allowing a plurality of encodings, S-exprs just perpetuate the character set mess that XML helped us escape from. 

Another contribution to the XML==S-expr discussions, and one which also blythely ignores any issues of encoding is the new version of Wadler and Simeon's "The Essence of XML", worthwhile reading at 

I made some comments on a previous draft on XML-DEV in "Not the essense of XML" http://lists.xml.org/archives/xml-dev/200207/msg00836.html, 

The most impressive thing about this may be the politeness of the authors: they say "XML
is touted as an external format for representing data. This is not a hard problem.
All we require are two properties...Lisp S-expressions, for example, possess these
properties.//XML possesses neither property."   Where is the politeness? Rather
than saying "The people who use XML for more than it was designed for may be mad, bad or irrationally exuberent" they blame the messages. Yet in avoiding the issues of encoding and construction of documents from parts, they miss two other properties of an external format: "modularity" and "reliability".

Many people thinking too much in terms of the XML Infoset seem to think that the issue of
labelling encoding is peripheral to XML, whereas I think it is central. There are
no other layers or channels for encoding labels to get passed, practically speaking;
XML is basically the only format that deals with this issue.  The rigorous labelling
of character encoding is the essence of XML, just as much as the angle brackets
or the element tree.

I think Simeon and Wadler's basic introductory spin is still wrong: 

* The property of self-describing as they seem to use it (which I think is good), seems depends on there being enough lexical forms for each datatype. But by the time you add dates and derived types, you would need to extend basic S-expr syntax. You would need to know all the (primitive) types you wanted to support at syntax-design-time, which rather goes against the point of XML.  And, at the other end, if you are only interested in the kinds of limited datatypes required for publishing (string and various symbols: token, tokens, ID, IDs, IDREF, IDREFs, enumerations, etc.), the lexical forms of markup and built-in DTDs are enough to make XML self-describing.  

* For the property of round-tripping, it strikes me that their argument only holds against XML Schemas and is nothing to do with plain XML, so they are still being free and loose to get a good title. Good for journalism, but surprising in an academic paper.

So their title and opening section are misleading or wrong still: not the essence of XML but the essense of XML Schema. I guess by hanging around XQuery people all the time, all the authors ever hear of XML is XML+WXS conflated, but I wish they would spare the rest of us.* At least their abstract is correct.  And the body of the paper? I found it very interesting on a lot of fronts, and well worth a delve.

Rick Jelliffe

* Perhaps it shows mindset at work that XQuery is "reforming" XML from a relatively untyped format with strings and tokens suitable for loosely-coupled systems which can be used with any datatyping convention,  to a strongly typed format with a fixed number of primitive built-in types  suitable for tightly-coupled systems: I heard a member of the XQuery WG say "without types you can't do anything!"


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS