OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XML 1.1 grinds to halt?

[ Lists Home | Date Index | Thread Index ]

Elliotte Rusty Harold wrote,
> At 7:42 AM -0700 4/29/03, Tim Bray wrote:
> > Really?  I just looked at a recent set of Java docs, and it's pretty
> > clear that a Java char isn't really a character, it's a UTF-16
> > codepoint, and the semantics of String are wrong for non-BMP
> > characters, and that the attempt at UTF-8 support remains pretty
> > laughably nonstandard and wrong.  I'd be *delighted* to hear that
> > I'm looking at wrong/obsolete docs.  Pointers anyone? -Tim
> Unfortunately, you're more than half right. The InputStreamReader and
> OutputStreamWriter classes do handle UTF-8 correctly. The readUTF and
> writeUTF methods in DataInputStream/DataOutputStream don't. This
> wouldn't be a problem if they were simply called readString/ 
> writeString instead.

Yup, that's right ... for all intents and purposes, readUTF and writeUTF 
should be treated as specifying a non-standard encoding solely for the 
use of Java RMI.

> However, your comments about the char types are dead on.

They're dead on, but unhelpful. There's really nothing that can be done 
right now which wouldn't break an awful lot of existing code. At least 
redesignating Java chars as UTF-16 units is honest.

If we're lucky, the output of this,


might help in the not too distant future.




News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS