OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   How simple is simple enough? - was Re: [xml-dev] xml 2.0 - so it's on th

[ Lists Home | Date Index | Thread Index ]

On Fri, 04 Feb 2005 20:50:00 -0500, Elliotte Harold
<elharo@metalab.unc.edu> wrote:
> Bob Foster wrote:
> > A barrier to entry is a barrier to entry. The fact that it's
> > hard to write an XML parser means it's hard to understand how to use
> > XML, too.
>  However, there are also cases where the
> difficulty of parsing XML is a direct result of making it easier to
> author and read. The matching of start-tags to end-tags is such an
> example. As has been pointed out elsewhere, this cause problems for
> certain kinds of grammars and parser generators. Yet it's a crucial part
> of making XML easier to use then SGML was.
> The bottom line is I don't find any arguments based on the difficulty of
> parsing XML to be really compelling, given that we have existence proofs
> that correct, efficient parsing of XML is possible.

Well, we're in permathread territory here, but what the heck. For what
it's worth, I agree with Bob Foster's point that the complexity of the
XML corner cases has a real impact on its usability.  I was reminded
of C. A. R. Hoare's famous quote "There are two ways of constructing a
piece of software: One is to make it so simple that there are
obviously no errors, and the other is to make it so complicated that
there are no obvious errors."  I submit that the same principle
applies to data formats, especially given the vastly increased
importance of robust software in these days of rampant malware.

 The existence proof of correct, efficient XML parsers is true only in
a somewhat academic sense.   XML's "efficiency" is not exactly its big
selling point these days, as Robin Berjon frequently reminds us. 
Likewise, obviously there is a community of people who understand XML
as specified, can write tools easily and well-formed XML without
thinking about it.  At least for me, that's not the target audience.
Out there in the real world are legions of people who still get
confused about the oddities in XML and re-invent ideas to make it
easier that were discussed to death in the WG and xml-dev over the
years.  (http://www.tjansen.de/blogen/2003/12/10-things-i-hate-about-xml.html
was brought to my  attention recently, as was
http://geekswithblogs.net/rebelgeekz/archive/2004/02/28/2433.aspx  --
which takes an opposite position from Elliotte Harold on the
user-friendliness of end tags).

I don't necessarily agree that anyone who understands formal grammars
at the CS undergrad level should be able to write an XML parser.  On
the other hand there is something to be said for making it less
necessary to know a whole lot of folklore about why things are the way
they are (but are not specified in the formal grammar) in order to
write interoperable software. I once watched someone who knew a lot of
CS but little XML try to write an XML parser ... it was not a pretty
sight or successful project. I get a fair number of inquiries from
users or developers about murky questions of XML syntax that are just
not obvious from the spec but well known in the folklore. (I would be
up a creek without Tim Bray's Annotated XML, let me tell you!). It's
great that we who know just enough to go to the annotated spec or
search the xml-dev archives have job security :-) but it would be much
nicer if this were not necessary.  That's going to take some work,
assuming XML 2.0 ever happens. (I recommend Tim's interview in the
February ACM Queue, in which he says it won't).

Still, whether or not there is an XML 2.0, there is almost certainly
going to be a withering away of the under-used legacy stuff and a move
toward tools that implicitly deprecate it or at least relegate it to
the "difficult things must be possible" corner.   I'm sure this is
anathema on this list, but hand-authored XML is just not a mainstream
use case anymore, and it's going to be harder and harder to make a
business case for keeping around the stuff (half the productions, I'll
guess?) that exist just to facilitate it.   In the long run, what the
Pointy Haired Bosses find compelling is more persuasive than what the
experts find compelling, I'm afraid.

Of course, given XML's acceptance today  this move toward
simplification  will take awhile to play out wiithout causing
excessive disruption.  (My own uber-not-so-pointy haired boss just
drove another stake in the ground for XML as the basis of interop -
Still, in the long run I'll predict that the need for software with
"obviously no errors" will outweigh the need for
backward-compatibility with hand-edited  and other first-generation
XML.  As a practical matter, I think that means what Rick Jelliffe
talks about in http://www.oreillynet.com/pub/wlg/6386 "have regular
periodic reviews (like ISO standard's 10-year reviews) and releases so
that the technology can improve in a way that vendors and users can
work with it."

Or as he puts it less formally "W3C determines who is happy with XML
the way it is and who is unhappy. Then they define a shiny new XML
which will make all the happy people unhappy, but make all the unhappy
people happy."   It's been 7 years next Thursday, about time to think
about that 10-year review, and maybe time to screw up the courage to
make some old-timers unhappy :-)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS