xml-dev - Re: UTF-8 vs UTF-16...?

Re: UTF-8 vs UTF-16...?

[ Lists Home | Date Index | Thread Index ]

From: David Brownell <david-b@pacbell.net>
To: Kragen Sitaker <kragen@pobox.com>
Date: Wed, 17 Nov 1999 10:47:32 -0800

Kragen Sitaker wrote:
> 
> According to the latest Unicode book (is it version 2.0?  Or 3.0?)
> UTF-8 does not allow you to encode more than the first 17 planes of ISO
> 10646. 

The Unicode book has a bias:  it only talks about the Unicode
aspects of UTF-8.  I've always felt that to be a disservice,
since they didn't develop or standardize UTF-8 and are thus
spreading misinformation.  (They could at least _mention_ the
fact that they're presenting a Unicode subset of full UTF-8!)

Better information is thankfully freely accessible.  See:

    http://www.ietf.org/rfc/rfc2279.txt

which includes the details of the five and six byte encodings.

Note that even with a four byte subset of UTF-8, you can encode
characters that can't be expressed in Unicode.  A few of the
test cases in the OASIS/NIST test suite (these cases happen to
come from James Clark's XMLTEST package) have such characters;
and any conformant XML processor must report a fatal error when
it sees them.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

References:
- Re: UTF-8 vs UTF-16...?
  - From: kragen@pobox.com (Kragen Sitaker)

Prev by Date: RE: Feeling good about SML
Next by Date: RE: Grafting DOM on a C++ XML parser?
Previous by thread: Re: UTF-8 vs UTF-16...?
Next by thread: SML Article/Essay
Index(es):
- Date
- Thread