Lists Home |
Date Index |
- From: "Peter S. Housel" <firstname.lastname@example.org>
- To: <email@example.com>
- Date: Tue, 4 Jan 2000 12:02:26 -0800
> No one's disagreeing with the use of Unicode; we're talking about
> which character encoding we'll use to represent it. You can represent
> Unicode in variable-width 8-bit or 16-bit encodings or in fixed-width
> 32-bit encodings.
My reading of the Unicode 2.x standard is that the above isn't strictly
correct. It is correct if you change "Unicode" to "the ISO 10646 Universal
Character Set" though.
> Note that Java uses UTF-16, which isn't quite fixed-width, though no
> one really notices.
It seems to me that Java uses Unicode, which maintains the semantics that 16
bits equals one character. Surrogates are characters in Unicode, whereas
those code points are not legal UCS characters, but only artifacts of the
Unicode looks like UTF-16, but the semantics are slightly different. So a
file using UTF-16 encoding containing a single "astral plane" character of
the UCS would be interpreted by Unicode as a file containing *two* surrogate
characters. (I think it's a strange tack to take, but it seems fairly clear
to me that this was their position as of Unicode 2.x. I haven't looked at
3.0 yet, so things may have changed since then.)
The XML character set is the UCS, not Unicode.
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)