Lists Home |
Date Index |
At 10:34 AM 22/03/02 +0000, Michael Kay wrote:
>I don't want to dumb XML down. But we do sometimes need to store data (e.g.
>WebDAV property values) which can potentially contain characters that are
>not permitted in XML. In fact, it's very unlikely that a WebDAV property
>value will contain such a character, but the software still needs to allow
>for the possibility.
>I don't personally see any good reason why C0 (and C1) characters shouldn't
>be permitted XML characters
I am very strongly against this, and not just for the excellent
statistical reasons that Rick raised. XML's greatest strength
is as an interchange format; as such it offers a degree of
cross-system interoperability that nothing else quite
achieves in my experience.
The interoperability is partly due to the fact that the content
consists of Unicode characters, which have widely agreed on
semantics as documented in Unicode and ISO 10646. However, the
C0 controls do *not* have such widely agreed on semantics (what
do ETX and EOD mean to you today?). And in general binary data
is less interoperable than textual data. Thus it has no place
If you need to interchange binary data (and we all do) that's fine,
but don't claim doing so is interoperable and don't try to dress
it up in XML clothes unless you're willing to base64 it or otherwise
clearly mark it as an opaque blob.
The fact that the C1 characters are currently allowed in XML
is simply a design error. I'd love to fix it but it's probably
Finally, the notion that allowing C0 & C1 chars helps with
binary data packing seems kind of bogus to me anyhow - in
any case you're going to have to filter to deal with U+0000 not
to mention "<" and "&", right? Wouldn't it be about the
same amount of work, and a lot cleaner, just to throw this
stuff into base64? -Tim