OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] MSXML DOM Special Chars Less Than 32

[ Lists Home | Date Index | Thread Index ]

> The interoperability is partly due to the fact that the content
> consists of Unicode characters, which have widely agreed on
> semantics as documented in Unicode and ISO 10646.  However, the
> C0 controls do *not* have such widely agreed on semantics (what
> do ETX and EOD mean to you today?).  And in general binary data
> is less interoperable than textual data.  Thus it has no place
> in XML.

We often use XML to transport information whose semantics we do not
understand.

<a>&#x02;&#x02;</a>
may well be gibberish, but so might
<a>oaaosiuc</a>

Both might mean something to somebody. It's not my job to judge; I'm only
the messenger.

>
> If you need to interchange binary data (and we all do) that's fine,
> but don't claim doing so is interoperable and don't try to dress
> it up in XML clothes unless you're willing to base64 it or otherwise
> clearly mark it as an opaque blob.

For occasional C0 characters appearing in the middle of printable text, the
XML character reference mechanism seems to be a good way of doing just that.

> Wouldn't it be about the
> same amount of work, and a lot cleaner, just to throw this
> stuff into base64?

If C0 characters only occur in 0.01% of the character strings that you
actually transmit, then base64 encoding is a heavy price to pay.

Michael Kay
Software AG
home: Michael.H.Kay@ntlworld.com
work: Michael.Kay@softwareag.com
>





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS