Lists Home |
Date Index |
- From: Tim Bray <email@example.com>
- To: John Cowan <firstname.lastname@example.org>, XML Dev <email@example.com>
- Date: Mon, 01 Mar 1999 10:23:57 -0800
At 12:58 PM 3/1/99 -0500, John Cowan wrote:
>> For instance, the Sinhala character set was not in Unicode 2.0 but will be
>> in 3.0. How do I get one of those characters in an XML document ?
>There is a discrepancy between the prose, which says "legal Unicode/10646
>characters" and references old versions of these standards, and
>the BNF, which says the Char production handles everything except
>known control characters (and even some of those).
John's right. And it's not the Sinhala that first brought it home, but
the Euro character, which is clearly OK per production  but isn't
a "legal yadda yadda yadda" per the particular amendment of 10646/Unicode
that the XML spec references. The W3C has some I18n heavies trying
to figure out what to do - life is made more complicated by the fact
that the Unicode people and the IETF i18n people don't always point
in the same direction, sigh; did you know the BOM was legal in UTF-8?
And of course by the fact that Unicode/10646 is a moving target.
But the bottom line is (see the public errata to the XML spec)
that production  is normative; both in theory and in practice,
XML processors pass through everything in that range. In practice,
I've never actually seen anything outside of the BMP, but the
experts agree they're showing up real soon now.
How to get it in? Something like 𐌳 I expect. As a programmer,
it'll show up either as two UTF-16 surrogates or 4+-byte UTF-8 string,
neither of which will look in the slightest like hex 10333. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)