OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: UTF-8 vs UTF-16...?

[ Lists Home | Date Index | Thread Index ]
  • From: David Brownell <david-b@pacbell.net>
  • To: Kragen Sitaker <kragen@pobox.com>
  • Date: Wed, 17 Nov 1999 10:47:32 -0800

Kragen Sitaker wrote:
> 
> According to the latest Unicode book (is it version 2.0?  Or 3.0?)
> UTF-8 does not allow you to encode more than the first 17 planes of ISO
> 10646. 

The Unicode book has a bias:  it only talks about the Unicode
aspects of UTF-8.  I've always felt that to be a disservice,
since they didn't develop or standardize UTF-8 and are thus
spreading misinformation.  (They could at least _mention_ the
fact that they're presenting a Unicode subset of full UTF-8!)

Better information is thankfully freely accessible.  See:

    http://www.ietf.org/rfc/rfc2279.txt

which includes the details of the five and six byte encodings.

Note that even with a four byte subset of UTF-8, you can encode
characters that can't be expressed in Unicode.  A few of the
test cases in the OASIS/NIST test suite (these cases happen to
come from James Clark's XMLTEST package) have such characters;
and any conformant XML processor must report a fatal error when
it sees them.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS