xml-dev - Re: [xml-dev] XML's Scylla and Charybdis

Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp
From: "Rick Jelliffe" <ricko@allette.com.au>
Date: Wed, 2 Apr 2003 03:17:20 +1000
References: <5.2.0.9.0.20030401113733.024d1168@ncmail.datadirect-technologies.com>

From: "Jonathan Robie" <jonathan.robie@datadirect-technologies.com>

> At 08:35 AM 4/1/2003 -0800, Dare Obasanjo wrote:
> 
> >In my experience faithful lexical round tripping is mainly important to 
> >applications that act as editors. In such cases, the people requesting 
> >such features in an API want even more requirements than XML 1.0 deems 
> >necessary such as preserving attribute order and all whitespace.
> 
> Yes, it's clearly helpful for that. I also know from XML databases I've 
> worked with that people really do get upset if you change namespace 
> prefixes when they import a document and export it again - but most editors 
> and databases do seem to sacrifice some faithfulness in their lexical 
> round-tripping.

One important application of XML is as source code.  

Imagine if a programming editor opened your Perl/Python/C#/Java/C++/SQL
program, renamed names of modules or classes or private methods or packages,
and threw away comments. You would undoubtedly spew, despite your
admirably easy-going nature. 

Someone wrote:

> >If the Information Set says that there is no distinction between <foo
> >"a"/> and <foo 'a'/>, why should I work hard to preserve the
> >distinction?

Because syntactic sugar is vital for humans, and can help processing.

This here thread comes out of a complaint about XML being too complex
to use regexes. Yet if we canonicalize our data (say, including that only <foo
x="a" /> is used ) then the regular expressions simplify themselves to something
much more useable.  If other software messes up this canonical form,
then we have to re-canonicalize it.  (Which suggests not that we should
work hard to preserve the distinction, but that if it is convenient we should 
support it.)

Cheers
Rick Jelliffe

Follow-Ups:
- Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp
  - From: Jonathan Robie <jonathan.robie@datadirect-technologies.com>

References:
- RE: [xml-dev] XML's Scylla and Charybdis - parse and regexp
  - From: Jonathan Robie <jonathan.robie@datadirect-technologies.com>

Prev by Date: Re: Interpreter for a subset of Knuth's MMIX in pure XSLT
Next by Date: RE: [xml-dev] XML's Scylla and Charybdis - parse and regexp
Previous by thread: RE: [xml-dev] XML's Scylla and Charybdis - parse and regexp
Next by thread: Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp
Index(es):
- Date
- Thread