[
Lists Home |
Date Index |
Thread Index
]
- From: David Brownell <david-b@pacbell.net>
- To: Kragen Sitaker <kragen@pobox.com>
- Date: Wed, 17 Nov 1999 10:47:32 -0800
Kragen Sitaker wrote:
>
> According to the latest Unicode book (is it version 2.0? Or 3.0?)
> UTF-8 does not allow you to encode more than the first 17 planes of ISO
> 10646.
The Unicode book has a bias: it only talks about the Unicode
aspects of UTF-8. I've always felt that to be a disservice,
since they didn't develop or standardize UTF-8 and are thus
spreading misinformation. (They could at least _mention_ the
fact that they're presenting a Unicode subset of full UTF-8!)
Better information is thankfully freely accessible. See:
http://www.ietf.org/rfc/rfc2279.txt
which includes the details of the five and six byte encodings.
Note that even with a four byte subset of UTF-8, you can encode
characters that can't be expressed in Unicode. A few of the
test cases in the OASIS/NIST test suite (these cases happen to
come from James Clark's XMLTEST package) have such characters;
and any conformant XML processor must report a fatal error when
it sees them.
- Dave
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|