[
Lists Home |
Date Index |
Thread Index
]
- To: "Rick Jelliffe" <ricko@allette.com.au>,<xml-dev@lists.xml.org>
- Subject: RE: [xml-dev] Some comments on the 1.1 draft
- From: "Michael Rys" <mrys@microsoft.com>
- Date: Wed, 19 Dec 2001 10:14:29 -0800
- Thread-index: AcGIWlNTdhMt326BQC6wJfLeWz+qpQAXCfZw
- Thread-topic: [xml-dev] Some comments on the 1.1 draft
Rick, I don't have a strong opinion on the name encoding (since our
products and SQLX already use an encoding that is a valid 1.0 name).
I don't understand your encoding issues though. I am mainly talking
about the Unicode code points. If somebody uses an encoding where U+85
is not a valid character, then it should error. If it is a valid
character but not the intended Unicode character, then it is an error
that a parser may not be able to detect (we certainly can get this even
in XML 1.0).
I can assure you that the database community has even more encoding
support than many XML processors (look up collations).
Best regards
Michael
> -----Original Message-----
> From: Rick Jelliffe [mailto:ricko@allette.com.au]
> Sent: Tuesday, December 18, 2001 23:03 PM
> To: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] Some comments on the 1.1 draft
>
> From: "Michael Rys" <mrys@microsoft.com>
>
> > Well, that may have been the original XML 1.0 use, but looking at
where
> > XML is currently having the most traction (SOAP, Messaging, WebDav,
> > database serialization etc), this has changed.
>
> One big advantage of disallowing control characters from XML documents
> and silly characters from XML names is that it catches most common
> encoding errors.
>
> For example, the very common problem of data labelled ISO 8859-1
> containing
> a 0x85 byte (for the Euro character).
>
> At the moment XML provides the only disiplined point in the processing
> chain:
> when data is in XML one *must* have the encoding correct. This may
> cause some consternation to us programmers, who perhaps have lived in
a
> fool's
> paradise where encoding does not matter, but it is a fundamental point
> of Quality Control for XML documents and exposes data corruption at
the
> point
> where it can be corrected.
>
> To allow control characters would make us sink back into the horrible
mess
> that everyone familiar with working in multi-character set
environments
> without
> XML is well aware (or, at least, becomes well aware when everything
comes
> crashing down).
>
> Most DBMS systems do not perform any checking of encoding. So you
> can store almost anything in, say, a DBMS expecting ISO 8859-1. With
> a world full of data incorrectly labelled, there is no chance of good
> interoperability without some basic checking. And those basic checks
> are what XML's data character and naming rules provide.
>
> Without them, sure XML would be "simpler" and we could attempt to
transmit
> arbitrary strings around. But then encoding detection or repair would
be
> the problem of the recipient and not the sender: a responsible
recipient
> can have no faith that their non-ASCII data has not been corrupted.
>
> And that lies at the heart of the matter: if we allow control
characters
> and silly name characters, we won't actually increase the number of
> characters that can be reliable sent: we will just make non-ASCII
> characters suspect and unreliable.
>
> Cheers
> Rick Jelliffe
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
|