xml-dev - Re: [xml-dev] XML's Scylla and Charybdis

Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp
From: Mike Champion <mc@xegesis.org>
Date: Tue, 01 Apr 2003 08:15:29 -0500
In-reply-to: <5.1.0.14.0.20030401093533.025e5300@mail.propylon.com>
References: <5.1.0.14.0.20030401093533.025e5300@mail.propylon.com>
User-agent: Opera7.03/Win32 M2 build 2670

On Tue, 01 Apr 2003 09:48:59 +0100, Sean McGrath 
<sean.mcgrath@propylon.com> wrote:

[Checking very carefully to see if this is one of Sean's famous April Fool 
jokes ... hmm, no that's another thread]

> Correctness or input fidelity - pick one - you cannot have both.
>
> This is at the core of why I've always argued that we *do* need a data 
> model for XML and we *do* need something like
> common XML because I want my processing to be both correct *and* non- 
> lossy (high input fidelity).
>
> Is that too much to ask?

Let me make sure I understand ... we need a definitive data model so that 
one can work with the normalized information in an XML document 
irrespective of whatever "syntax sugar" was used to represent the 
information, and we need something like Common XML to define a canonical 
serialization of the data model that can will not lose fidelity through 
successive parse / serialization stages?

I agree.  It sounds like existing data models don't quite do the job 
because they don't (except for the DOM data model, which has its own 
problems) let one keep unexpanded entity references around.  Likewise 
Common XML as sml-dev defined it doesn't include entity definitions and 
references.

I strongly agree if we're saying that XML (or some successor) needs a) to 
treat the syntax and data model as two halves of the same whole; b) to 
*conceptually* handle "syntax sugar" in a preprocessing phase where CDATA 
sections are handled, whitespace normalized, quotes standardized, [entities 
expanded ???], comments stripped out, [PI's stripped out???], etc.; c) the 
actual core grammar is based on the "Common XML" so text operations on the 
common/canonical syntax can be correct and non-lossy; d) alternate 
serializations of the data model are acknowledged as "legal" insofar as 
they reliably and losslessly round trip with the common/canonical syntax; 
e) additional information such as that introduced by schemas and other 
datatyping schemes is another layer on top of all this.

That lets XML be text for text processing people and Desperate Perl/Python 
Hackers, and XML be data for data processing people, sharing common 
technologies where appropriate but adding different layers for specialized 
needs where appropriate.

References:
- XML's Scylla and Charybdis - parse and regexp
  - From: Sean McGrath <sean.mcgrath@propylon.com>

Prev by Date: Namespaces, schemas and stylesheets
Next by Date: Re: [xml-dev] Full text search with XML input in Oracle?
Previous by thread: XML's Scylla and Charybdis - parse and regexp
Next by thread: Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp
Index(es):
- Date
- Thread