OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Mix encodings in a document?

[ Lists Home | Date Index | Thread Index ]
  • From: John Cowan <cowan@locke.ccil.org>
  • To: XML Dev <xml-dev@ic.ac.uk>
  • Date: Mon, 28 Sep 1998 14:50:29 -0400

Tony Graham scripsit:

> Surrogate pairs are not allowed in parsed entities.  The production
> for Char excludes the surrogate blocks:
> [2] Char::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
>             | [#x10000-#x10FFFF]

On the contrary.  UTF-16 is a standard representation that XML
systems must accept (clause 4.3.3), and the representation of the
characters #x10000-#x10FFFF in UTF-16 (which is the same as
Unicode 2.x) is precisely a surrogate pair.

Individual surrogate characters are excluded, but they have no meaning
in UTF-16 anyway.

> You can include non-BMP/non-UCS-2 characters by making numeric
> references to their Unicode Scalar Value (or by using UCS-4).

That works too.

John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS