[
Lists Home |
Date Index |
Thread Index
]
- From: Tim Bray <tbray@textuality.com>
- To: "Paul W. Abrahams" <abrahams@valinet.com>, XMLDev list <xml-dev@ic.ac.uk>
- Date: Thu, 16 Sep 1999 17:37:22 -0700
At 06:12 PM 9/16/99 -0400, Paul W. Abrahams wrote:
>The XML 1.0 spec explicitly excludes the Unicode surrogate characters
>from XML documents (production 2). It now seems, from information
>I've picked up on the Unicode web site, that surrogate characters are
>likely to play a more important role in the future, since the
>available 16-bit characters are almost all used up. (Unicode 2.0 has
>18,134 spares but Unicode 3.0 has only 7827 spares. The trend is
>clear.)
No. Production [2] says
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF]
| [#xE000-#xFFFD] | [#x10000-#x10FFFF]
This follows the unicode model in allowing 17 planes of 64k characters
each, i.e. about a million characters. For this to work in UTF-16, you
need surrogate pairs. What XML rules out is *characters* whose numeric
value is that of one-half of a surrogate pair. There will never be any
such characters precisely because those values are reserved for use in
surrogate pairs. That's why XML rules them out. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|