xml-dev - RE: [xml-dev] MSXML DOM Special Chars Less Than 32

RE: [xml-dev] MSXML DOM Special Chars Less Than 32

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: RE: [xml-dev] MSXML DOM Special Chars Less Than 32
From: Tim Bray <tbray@textuality.com>
Date: Fri, 22 Mar 2002 10:13:08 -0800
In-reply-to: <001701c1d18d$36247f40$655169d5@pcukmka>
References: <020001c1d173$dada0510$4bc8a8c0@AlletteSystems.com>

At 10:34 AM 22/03/02 +0000, Michael Kay wrote:
>I don't want to dumb XML down. But we do sometimes need to store data (e.g.
>WebDAV property values) which can potentially contain characters that are
>not permitted in XML. In fact, it's very unlikely that a WebDAV property
>value will contain such a character, but the software still needs to allow
>for the possibility.
>
>I don't personally see any good reason why C0 (and C1) characters shouldn't
>be permitted XML characters

I am very strongly against this, and not just for the excellent
statistical reasons that Rick raised.  XML's greatest strength
is as an interchange format; as such it offers a degree of
cross-system interoperability that nothing else quite
achieves in my experience.

The interoperability is partly due to the fact that the content 
consists of Unicode characters, which have widely agreed on 
semantics as documented in Unicode and ISO 10646.  However, the 
C0 controls do *not* have such widely agreed on semantics (what 
do ETX and EOD mean to you today?).  And in general binary data 
is less interoperable than textual data.  Thus it has no place 
in XML.

If you need to interchange binary data (and we all do) that's fine, 
but don't claim doing so is interoperable and don't try to dress
it up in XML clothes unless you're willing to base64 it or otherwise
clearly mark it as an opaque blob.

The fact that the C1 characters are currently allowed in XML
is simply a design error.  I'd love to fix it but it's probably
too late.

Finally, the notion that allowing C0 & C1 chars helps with
binary data packing seems kind of bogus to me anyhow - in 
any case you're going to have to filter to deal with U+0000 not
to mention "<" and "&", right?  Wouldn't it be about the
same amount of work, and a lot cleaner, just to throw this
stuff into base64?   -Tim

Follow-Ups:
- Re: [xml-dev] MSXML DOM Special Chars Less Than 32
  - From: John Cowan <jcowan@reutershealth.com>
- RE: [xml-dev] MSXML DOM Special Chars Less Than 32
  - From: "Michael Kay" <michael.h.kay@ntlworld.com>

References:
- Re: [xml-dev] MSXML DOM Special Chars Less Than 32
  - From: "Rick Jelliffe" <ricko@allette.com.au>
- RE: [xml-dev] MSXML DOM Special Chars Less Than 32
  - From: "Michael Kay" <michael.h.kay@ntlworld.com>

Prev by Date: RE: [xml-dev] Do Names Matter?
Next by Date: RE: [xml-dev] MSXML DOM Special Chars Less Than 32
Previous by thread: RE: [xml-dev] MSXML DOM Special Chars Less Than 32
Next by thread: RE: [xml-dev] MSXML DOM Special Chars Less Than 32
Index(es):
- Date
- Thread