[
Lists Home |
Date Index |
Thread Index
]
On Wed, Dec 19, 2001 at 02:16:09AM -0800, Dare Obasanjo wrote:
> ...but there are issues such as
> a.) XML is not as efficient as a serialization technology ...
> b.) Using XML involves ... half a dozen buzzword technologies...
> c.) XML was originally designed to deal with text primarily and not binary
> data ...
I guess I don't see why any of the above are good reasons not to allow
binary data. Sure, its not optimal. Sure XML has lots of other related
standards. Sure XML was *originally* intended for text.
But what about now?
Is there a good reason why PCDATA should not be allowed to contain the
bell (BEL) or escape (ESC) character? Why is it always evil to allow
terminal escape sequences to be embedded in an XML document? Why is it
important for XML to *not* permit it to be used for things that it was
not originally intended for?
With XML poping up all over the place now I think its actually worse to
choose a selected subset of characters from the Unicode spec that are
not legal in XML. Some C0 controls are allowed after all - horizontal
tab is allowed (but not vertical tab). All C1 controls are allowed.
Hmmm. I must admit I just confused myself. Looking at the Unicode 3.0
book we have, it defines C0 controls (0x00-0x1f) and C1 controls
(0x80-0x9f). The XML 1.0 spec says it allows "any Unicode character
excluding the surrogate bocks, FFFE, and FFFF". It then lists #x9, #xA,
#xD, #x20-#xD7FF etc. So why are *some* C0 control Unicode chars
"not Unicode", but *all* C1 control Unicode chars "are Unicode"?
It seems totally self-contradictory and bizare to me. But I am sure
this has all been argued before somewhere. (I was reading section 2.2
of the XML 1.0 spec by the way.)
XML is defined in terms of Unicode. Why not therefore allow all
Unicode characters in PCDATA of XML?
Alan
|