[
Lists Home |
Date Index |
Thread Index
]
when one considers java's implementation-specific 8-bit external string
encoding, one should keep its purpose[1] and the specified relation to java's
primitive data representations[2] in mind.
[1] http://java.sun.com/j2se/1.4.1/docs/api/java/io/DataInputStream.html
[2] http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#20080
Miles Sabin wrote:
>
> Elliotte Rusty Harold wrote,
> > At 7:42 AM -0700 4/29/03, Tim Bray wrote:
> > > Really? I just looked at a recent set of Java docs, and it's pretty
> > > clear that a Java char isn't really a character, it's a UTF-16
> > > codepoint, and the semantics of String are wrong for non-BMP
> > > characters, and that the attempt at UTF-8 support remains pretty
> > > laughably nonstandard and wrong. I'd be *delighted* to hear that
> > > I'm looking at wrong/obsolete docs. Pointers anyone? -Tim
> >
> > Unfortunately, you're more than half right. The InputStreamReader and
> > OutputStreamWriter classes do handle UTF-8 correctly. The readUTF and
> > writeUTF methods in DataInputStream/DataOutputStream don't. This
> > wouldn't be a problem if they were simply called readString/
> > writeString instead.
>
> Yup, that's right ... for all intents and purposes, readUTF and writeUTF
> should be treated as specifying a non-standard encoding solely for the
> use of Java RMI.
>
|