Lists Home |
Date Index |
- From: John Cowan <email@example.com>
- To: XML Dev <firstname.lastname@example.org>
- Date: Mon, 28 Sep 1998 14:50:29 -0400
Tony Graham scripsit:
> Surrogate pairs are not allowed in parsed entities. The production
> for Char excludes the surrogate blocks:
>  Char::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
> | [#x10000-#x10FFFF]
On the contrary. UTF-16 is a standard representation that XML
systems must accept (clause 4.3.3), and the representation of the
characters #x10000-#x10FFFF in UTF-16 (which is the same as
Unicode 2.x) is precisely a surrogate pair.
Individual surrogate characters are excluded, but they have no meaning
in UTF-16 anyway.
> You can include non-BMP/non-UCS-2 characters by making numeric
> references to their Unicode Scalar Value (or by using UCS-4).
That works too.
John Cowan http://www.ccil.org/~cowan email@example.com
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)