OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Character Range: surrogate blocks

[ Lists Home | Date Index | Thread Index ]
  • From: Tim Bray <tbray@textuality.com>
  • To: emberson@faslab.com (Richard Emberson), xml-dev@ic.ac.uk
  • Date: Sat, 17 Oct 1998 18:49:54 -0700

At 03:18 PM 10/17/98 -0700, Richard Emberson wrote:

>Now in production rule #2 titled Character Range 
>surrogate blocks are explicitly excluded (along 
>with FFFF and FFFE). 

There are no Unicode characters whose numeric values
are those which appear in the surrogate blocks; the blocks
exist only to ensure the possibility of encoding non-BMP 
characters unambiguously.  The productions in the spec
describe the characters themselves, not any particular
encoding of them.

>There are the extra, beyond 16-bit, characters specified
>by the spec in production rule #2 as "[x10000-#x10FFFF]".
>Is this how Unicode characters that use the surrogate
>blocks get represented in an XML document? 

Yes.  For example, it's legal to have &#x10001;

>Short of getting a copy of the Unicode 2.0 spec, is there 
>anywhere where the conversion algorithm is documented?

I strongly recommend getting a copy of the spec.  It's fairly
priced and a very fine piece of work. -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS