Lists Home |
Date Index |
- From: Tim Bray <email@example.com>
- To: firstname.lastname@example.org (Richard Emberson), email@example.com
- Date: Sat, 17 Oct 1998 18:49:54 -0700
At 03:18 PM 10/17/98 -0700, Richard Emberson wrote:
>Now in production rule #2 titled Character Range
>surrogate blocks are explicitly excluded (along
>with FFFF and FFFE).
There are no Unicode characters whose numeric values
are those which appear in the surrogate blocks; the blocks
exist only to ensure the possibility of encoding non-BMP
characters unambiguously. The productions in the spec
describe the characters themselves, not any particular
encoding of them.
>There are the extra, beyond 16-bit, characters specified
>by the spec in production rule #2 as "[x10000-#x10FFFF]".
>Is this how Unicode characters that use the surrogate
>blocks get represented in an XML document?
Yes. For example, it's legal to have 𐀁
>Short of getting a copy of the Unicode 2.0 spec, is there
>anywhere where the conversion algorithm is documented?
I strongly recommend getting a copy of the spec. It's fairly
priced and a very fine piece of work. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)