OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Unicode surrogate block in XML?

[ Lists Home | Date Index | Thread Index ]
  • From: "Paul W. Abrahams" <abrahams@valinet.com>
  • To: XMLDev list <xml-dev@ic.ac.uk>
  • Date: Fri, 17 Sep 1999 22:16:29 -0400

Tony Graham (tgraham@mulberrytech.com)
Fri, 17 Sep 1999 01:15:51 -0400 (EST)

>> In any XML document, you can make numeric references to any Unicode

character in the range #x10000 to #x10FFFF (as well as to any other
legal character number).  These references are independent of the
encoding used in the XML document. <<

Is it really correct to refer to #x10FFFF, say, as a Unicode
character, since Unicode characters are limited to 16 bits?  I'd think
it's necessary here to refer to that as a UCS-4 character.

>> The sequence of #xD800 #xDC00 is the two Surrogate code values that

address #x10000.  That four-byte sequence may occur in a UTF-16
encoded file to represent #x10000.  In contrast, "&#xD800;&#xDC00;" in

an XML document is two illegal character references in a row. <<

I've been trying to fathom the distinction between Unicode and UTF-16,
if there is one, and how these in turn relate to the UCS-2 encoding of
ISO 10646.  There's also the question of whether an XML document can
be stored directly in Unicode, or whether instead it must be stored in
either UTF-8 or UTF-16,  as Section 2.2 seems to imply when it says
``all XML processors must accept the UTF-8 and UTF-16 encodings of
10646''.   The latter appears to be the case; but if it isn't, then
how would an XML  document be stored directly in Unicode?   I've
pondered both Appendix C of the Unicode Standard and the relevant part
of the FAQ on the Unicode website, and I'm still unclear about all of
this.  (By the way, the FAQ erroneously refers to UTF as the Unicode
Transformation Format rather than the UCS transformation format.)

In any event, thanks, Tony, for your very enlightening response to my
original query.

Paul Abrahams

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS