[
Lists Home |
Date Index |
Thread Index
]
At 7:42 AM -0700 4/29/03, Tim Bray wrote:
>Really? I just looked at a recent set of Java docs, and it's pretty
>clear that a Java char isn't really a character, it's a UTF-16
>codepoint, and the semantics of String are wrong for non-BMP
>characters, and that the attempt at UTF-8 support remains pretty
>laughably nonstandard and wrong. I'd be *delighted* to hear that
>I'm looking at wrong/obsolete docs. Pointers anyone? -Tim
Unfortunately, you're more than half right. The InputStreamReader and
OutputStreamWriter classes do handle UTF-8 correctly. The readUTF and
writeUTF methods in DataInputStream/DataOutputStream don't. This
wouldn't be a problem if they were simply called
readString/writeString instead. However, your comments about the char
types are dead on.
--
Elliotte Rusty Harold
elharo@metalab.unc.edu
Processing XML with Java (Addison-Wesley, 2002)
http://www.cafeconleche.org/books/xmljava
http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA
|