Lists Home |
Date Index |
- To: "Rick Jelliffe" <firstname.lastname@example.org>,<email@example.com>
- Subject: RE: [xml-dev] Some comments on the 1.1 draft
- From: "Michael Rys" <firstname.lastname@example.org>
- Date: Wed, 19 Dec 2001 10:14:29 -0800
- Thread-index: AcGIWlNTdhMt326BQC6wJfLeWz+qpQAXCfZw
- Thread-topic: [xml-dev] Some comments on the 1.1 draft
Rick, I don't have a strong opinion on the name encoding (since our
products and SQLX already use an encoding that is a valid 1.0 name).
I don't understand your encoding issues though. I am mainly talking
about the Unicode code points. If somebody uses an encoding where U+85
is not a valid character, then it should error. If it is a valid
character but not the intended Unicode character, then it is an error
that a parser may not be able to detect (we certainly can get this even
in XML 1.0).
I can assure you that the database community has even more encoding
support than many XML processors (look up collations).
> -----Original Message-----
> From: Rick Jelliffe [mailto:email@example.com]
> Sent: Tuesday, December 18, 2001 23:03 PM
> To: firstname.lastname@example.org
> Subject: Re: [xml-dev] Some comments on the 1.1 draft
> From: "Michael Rys" <email@example.com>
> > Well, that may have been the original XML 1.0 use, but looking at
> > XML is currently having the most traction (SOAP, Messaging, WebDav,
> > database serialization etc), this has changed.
> One big advantage of disallowing control characters from XML documents
> and silly characters from XML names is that it catches most common
> encoding errors.
> For example, the very common problem of data labelled ISO 8859-1
> a 0x85 byte (for the Euro character).
> At the moment XML provides the only disiplined point in the processing
> when data is in XML one *must* have the encoding correct. This may
> cause some consternation to us programmers, who perhaps have lived in
> paradise where encoding does not matter, but it is a fundamental point
> of Quality Control for XML documents and exposes data corruption at
> where it can be corrected.
> To allow control characters would make us sink back into the horrible
> that everyone familiar with working in multi-character set
> XML is well aware (or, at least, becomes well aware when everything
> crashing down).
> Most DBMS systems do not perform any checking of encoding. So you
> can store almost anything in, say, a DBMS expecting ISO 8859-1. With
> a world full of data incorrectly labelled, there is no chance of good
> interoperability without some basic checking. And those basic checks
> are what XML's data character and naming rules provide.
> Without them, sure XML would be "simpler" and we could attempt to
> arbitrary strings around. But then encoding detection or repair would
> the problem of the recipient and not the sender: a responsible
> can have no faith that their non-ASCII data has not been corrupted.
> And that lies at the heart of the matter: if we allow control
> and silly name characters, we won't actually increase the number of
> characters that can be reliable sent: we will just make non-ASCII
> characters suspect and unreliable.
> Rick Jelliffe
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>