OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Java/Unicode brain damage

At 10:14 AM -0700 7/26/01, Benjamin Franz wrote:

>I'm being dense today. When you say 'UTF-16 units' do you mean that in
>Java a single character in the surrogate ranges may consist of (correctly
>IMHO) a _complete_ 32-bit surrogate pair or (dain bramagedly) of the
>individual 'halfs' of the pair (thus making a single character into two
>individual 'units' of 16-bits each)?

The latter

>If the latter, the Java's handling of
>Unicode is broken-as-designed and must be fixed (most likely via
>deprecation of the existing String in favor of a completely new string
>type for the sake of backwards compatibility with already deployed apps).

It's worse. It's not just the String class. It's the char primitive data type which is much harder to change precisely because it's not a class. 

In 20-20 hindsight, there probably never should have been a char type in the first place, and all APIs should have been designed to work with String and Character objects instead. 

| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      | 
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |