OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: XML and special Characters : unicode v3.0 ?

[ Lists Home | Date Index | Thread Index ]
  • From: Tim Bray <tbray@textuality.com>
  • To: John Cowan <cowan@locke.ccil.org>, XML Dev <xml-dev@ic.ac.uk>
  • Date: Mon, 01 Mar 1999 10:23:57 -0800

At 12:58 PM 3/1/99 -0500, John Cowan wrote:
>> For instance, the Sinhala character set was not in Unicode 2.0 but will be
>> in 3.0. How do I get one of those characters in an XML document ? 
>There is a discrepancy between the prose, which says "legal Unicode/10646
>characters" and references old versions of these standards, and
>the BNF, which says the Char production handles everything except
>known control characters (and even some of those).

John's right.  And it's not the Sinhala that first brought it home, but
the Euro character, which is clearly OK per production [2] but isn't
a "legal yadda yadda yadda" per the particular amendment of 10646/Unicode
that the XML spec references.  The W3C has some I18n heavies trying
to figure out what to do - life is made more complicated by the fact
that the Unicode people and the IETF i18n people don't always point
in the same direction, sigh; did you know the BOM was legal in UTF-8?
And of course by the fact that Unicode/10646 is a moving target.

But the bottom line is (see the public errata to the XML spec)
that production [2] is normative; both in theory and in practice,
XML processors pass through everything in that range.  In practice,
I've never actually seen anything outside of the BMP, but the 
experts agree they're showing up real soon now.   

How to get it in? Something like &#x10333; I expect.  As a programmer,
it'll show up either as two UTF-16 surrogates or 4+-byte UTF-8 string,
neither of which will look in the slightest like hex 10333.  -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS