OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Some comments on the 1.1 draft

[ Lists Home | Date Index | Thread Index ]
  • To: "Rick Jelliffe" <ricko@allette.com.au>,<xml-dev@lists.xml.org>
  • Subject: RE: [xml-dev] Some comments on the 1.1 draft
  • From: "Michael Rys" <mrys@microsoft.com>
  • Date: Wed, 19 Dec 2001 10:14:29 -0800
  • Thread-index: AcGIWlNTdhMt326BQC6wJfLeWz+qpQAXCfZw
  • Thread-topic: [xml-dev] Some comments on the 1.1 draft

Rick, I don't have a strong opinion on the name encoding (since our
products and SQLX already use an encoding that is a valid 1.0 name). 

I don't understand your encoding issues though. I am mainly talking
about the Unicode code points. If somebody uses an encoding where U+85
is not a valid character, then it should error. If it is a valid
character but not the intended Unicode character, then it is an error
that a parser may not be able to detect (we certainly can get this even
in XML 1.0).

I can assure you that the database community has even more encoding
support than many XML processors (look up collations). 

Best regards

> -----Original Message-----
> From: Rick Jelliffe [mailto:ricko@allette.com.au]
> Sent: Tuesday, December 18, 2001 23:03 PM
> To: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] Some comments on the 1.1 draft
> From: "Michael Rys" <mrys@microsoft.com>
> > Well, that may have been the original XML 1.0 use, but looking at
> > XML is currently having the most traction (SOAP, Messaging, WebDav,
> > database serialization etc), this has changed.
> One big advantage of disallowing control characters from XML documents
> and silly characters from XML names is that it catches most common
> encoding errors.
> For example, the very common problem of data labelled ISO 8859-1
> containing
> a 0x85 byte (for the Euro character).
> At the moment XML provides the only disiplined point in the processing
> chain:
> when data is in XML one *must* have the encoding correct.  This may
> cause some consternation to us programmers, who perhaps have lived in
> fool's
> paradise where encoding does not matter, but it is a fundamental point
> of Quality Control for XML documents and exposes data corruption at
> point
> where it can be corrected.
> To allow control characters would make us sink back into the horrible
> that everyone familiar  with working in multi-character set
> without
> XML is well aware (or, at least, becomes well aware when everything
> crashing down).
> Most DBMS systems do not perform any checking of encoding. So you
> can store almost anything in, say, a DBMS expecting ISO 8859-1.  With
> a world full of data incorrectly labelled, there is no chance of good
> interoperability without some basic checking. And those basic checks
> are what XML's data character and naming rules provide.
> Without them, sure XML would be "simpler" and we could attempt to
> arbitrary strings around. But then encoding detection or repair would
> the problem of the recipient and not the sender: a responsible
> can have no faith that their non-ASCII data has not been corrupted.
> And that lies at the heart of the matter: if we allow control
> and silly name characters, we won't actually increase the number of
> characters that can be reliable sent: we will just make non-ASCII
> characters suspect and unreliable.
> Cheers
> Rick Jelliffe
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS