[
Lists Home |
Date Index |
Thread Index
]
Okay, I am thoroughly confused. And I apologize for asking a question that
has undoubtedly come up more than once before--but I have at least done a
Google search and not turned up anything that looks like an answer to this.
I am trying to figure out the precise relationship between the XML character
set and the Unicode character set. I understand what specific characters are
and aren't allowed in XML--the formal grammar is perfectly clear in that
regard. But I'm bothered by the prose description in section 2.2 of the XML
spec, which appears to be inconsistent with the EBNF. Specifically I'm
referring to the sentence that reads:
Legal characters are tab, carriage return, line feed, and the legal
characters of Unicode and ISO/IEC 10646.
This implies that tab, carriage return, and line feed are not Unicode
characters. But as far as I can tell they are. They're in the code charts at
unicode.org, along with all the other control characters below 0x0020; I've
also been looking at the Unicode 3.0 standard (esp. section 2.8 on control
characters) and the Unicode FAQ, and I can't find any suggestion that those
control characters are somehow not characters, or not legal characters.
Yet if the sentence I quoted above were incorrect, surely someone would have
noticed by now and filed an erratum; there is no such erratum. So what is
the point I have failed to grasp here?
--
Matt Gushee When a nation follows the Way,
Englewood, Colorado, USA Horses bear manure through
mgushee@havenrock.com its fields;
http://www.havenrock.com/ When a nation ignores the Way,
Horses bear soldiers through
its streets.
--Lao Tzu (Peter Merel, trans.)
|