xml-dev - RE: [xml-dev] Some comments on the 1.1 draft

RE: [xml-dev] Some comments on the 1.1 draft

[ Lists Home | Date Index | Thread Index ]

To: "Rick Jelliffe" <ricko@allette.com.au>,<xml-dev@lists.xml.org>
Subject: RE: [xml-dev] Some comments on the 1.1 draft
From: "Michael Rys" <mrys@microsoft.com>
Date: Wed, 19 Dec 2001 10:14:29 -0800
Thread-index: AcGIWlNTdhMt326BQC6wJfLeWz+qpQAXCfZw
Thread-topic: [xml-dev] Some comments on the 1.1 draft

Rick, I don't have a strong opinion on the name encoding (since our
products and SQLX already use an encoding that is a valid 1.0 name). 

I don't understand your encoding issues though. I am mainly talking
about the Unicode code points. If somebody uses an encoding where U+85
is not a valid character, then it should error. If it is a valid
character but not the intended Unicode character, then it is an error
that a parser may not be able to detect (we certainly can get this even
in XML 1.0).

I can assure you that the database community has even more encoding
support than many XML processors (look up collations). 

Best regards
Michael

> -----Original Message-----
> From: Rick Jelliffe [mailto:ricko@allette.com.au]
> Sent: Tuesday, December 18, 2001 23:03 PM
> To: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] Some comments on the 1.1 draft
> 
> From: "Michael Rys" <mrys@microsoft.com>
> 
> > Well, that may have been the original XML 1.0 use, but looking at
where
> > XML is currently having the most traction (SOAP, Messaging, WebDav,
> > database serialization etc), this has changed.
> 
> One big advantage of disallowing control characters from XML documents
> and silly characters from XML names is that it catches most common
> encoding errors.
> 
> For example, the very common problem of data labelled ISO 8859-1
> containing
> a 0x85 byte (for the Euro character).
> 
> At the moment XML provides the only disiplined point in the processing
> chain:
> when data is in XML one *must* have the encoding correct.  This may
> cause some consternation to us programmers, who perhaps have lived in
a
> fool's
> paradise where encoding does not matter, but it is a fundamental point
> of Quality Control for XML documents and exposes data corruption at
the
> point
> where it can be corrected.
> 
> To allow control characters would make us sink back into the horrible
mess
> that everyone familiar  with working in multi-character set
environments
> without
> XML is well aware (or, at least, becomes well aware when everything
comes
> crashing down).
> 
> Most DBMS systems do not perform any checking of encoding. So you
> can store almost anything in, say, a DBMS expecting ISO 8859-1.  With
> a world full of data incorrectly labelled, there is no chance of good
> interoperability without some basic checking. And those basic checks
> are what XML's data character and naming rules provide.
> 
> Without them, sure XML would be "simpler" and we could attempt to
transmit
> arbitrary strings around. But then encoding detection or repair would
be
> the problem of the recipient and not the sender: a responsible
recipient
> can have no faith that their non-ASCII data has not been corrupted.
> 
> And that lies at the heart of the matter: if we allow control
characters
> and silly name characters, we won't actually increase the number of
> characters that can be reliable sent: we will just make non-ASCII
> characters suspect and unreliable.
> 
> Cheers
> Rick Jelliffe
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>

Prev by Date: Re: [xml-dev] Some comments on the 1.1 draft
Next by Date: RE: [xml-dev] Some comments on the 1.1 draft
Previous by thread: Re: [xml-dev] Some comments on the 1.1 draft
Next by thread: RE: [xml-dev] Some comments on the 1.1 draft
Index(es):
- Date
- Thread