xml-dev - Re: [xml-dev] Some comments on the 1.1 draft

Re: [xml-dev] Some comments on the 1.1 draft

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Some comments on the 1.1 draft
From: "Rick Jelliffe" <ricko@allette.com.au>
Date: Wed, 19 Dec 2001 18:02:55 +1100
References: <5C39F806F9939046B4B1AFE652500A3A251914@RED-MSG-10.redmond.corp.microsoft.com>

From: "Michael Rys" <mrys@microsoft.com>
 
> Well, that may have been the original XML 1.0 use, but looking at where
> XML is currently having the most traction (SOAP, Messaging, WebDav,
> database serialization etc), this has changed.
 
One big advantage of disallowing control characters from XML documents
and silly characters from XML names is that it catches most common encoding errors.

For example, the very common problem of data labelled ISO 8859-1 containing
a 0x85 byte (for the Euro character).

At the moment XML provides the only disiplined point in the processing chain:
when data is in XML one *must* have the encoding correct.  This may
cause some consternation to us programmers, who perhaps have lived in a fool's 
paradise where encoding does not matter, but it is a fundamental point
of Quality Control for XML documents and exposes data corruption at the point
where it can be corrected.

To allow control characters would make us sink back into the horrible mess 
that everyone familiar  with working in multi-character set environments without 
XML is well aware (or, at least, becomes well aware when everything comes
crashing down).   

Most DBMS systems do not perform any checking of encoding. So you
can store almost anything in, say, a DBMS expecting ISO 8859-1.  With
a world full of data incorrectly labelled, there is no chance of good 
interoperability without some basic checking. And those basic checks
are what XML's data character and naming rules provide. 

Without them, sure XML would be "simpler" and we could attempt to transmit
arbitrary strings around. But then encoding detection or repair would be
the problem of the recipient and not the sender: a responsible recipient 
can have no faith that their non-ASCII data has not been corrupted.

And that lies at the heart of the matter: if we allow control characters
and silly name characters, we won't actually increase the number of
characters that can be reliable sent: we will just make non-ASCII 
characters suspect and unreliable.  

Cheers
Rick Jelliffe

Follow-Ups:
- Re: [xml-dev] Some comments on the 1.1 draft
  - From: Gavin Thomas Nicol <gtn@rbii.com>
- Re: [xml-dev] Some comments on the 1.1 draft
  - From: Alan Kent <ajk@mds.rmit.edu.au>

References:
- RE: [xml-dev] Some comments on the 1.1 draft
  - From: "Michael Rys" <mrys@microsoft.com>

Prev by Date: Re: [xml-dev] terra incognita
Next by Date: Re: [xml-dev] terra incognita
Previous by thread: RE: [xml-dev] Some comments on the 1.1 draft
Next by thread: Re: [xml-dev] Some comments on the 1.1 draft
Index(es):
- Date
- Thread