xml-dev - Re: Unicode surrogate block in XML?

Re: Unicode surrogate block in XML?

[ Lists Home | Date Index | Thread Index ]

From: Tim Bray <tbray@textuality.com>
To: "Paul W. Abrahams" <abrahams@valinet.com>, XMLDev list <xml-dev@ic.ac.uk>
Date: Thu, 16 Sep 1999 17:37:22 -0700

At 06:12 PM 9/16/99 -0400, Paul W. Abrahams wrote:
>The XML 1.0 spec explicitly excludes the Unicode surrogate characters
>from XML documents (production 2).  It now seems, from information
>I've picked up on the Unicode web site, that surrogate characters are
>likely to play a more important role in the future, since the
>available 16-bit characters are almost all used up.  (Unicode 2.0 has
>18,134 spares but Unicode 3.0 has only 7827 spares.  The trend is
>clear.)

No. Production [2] says

[2] Char ::=  #x9 | #xA | #xD | [#x20-#xD7FF]
              | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

This follows the unicode model in allowing 17 planes of 64k characters
each, i.e. about a million characters.  For this to work in UTF-16, you
need surrogate pairs.  What XML rules out is *characters* whose numeric 
value is that of one-half of a surrogate pair.  There will never be any
such characters precisely because those values are reserved for use in 
surrogate pairs.  That's why XML rules them out. -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Prev by Date: RE: Another look at namespaces
Next by Date: Discourse and Discourtesies
Previous by thread: Re: Unicode surrogate block in XML?
Next by thread: Re: Unicode surrogate block in XML?
Index(es):
- Date
- Thread