OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Java/Unicode brain damage



Benjamin Franz wrote,
> Miles Sabin wrote,
> > [snip: answering the wrong question]
>
> I'm being dense today. When you say 'UTF-16 units' do you mean that
> in Java a single character in the surrogate ranges may consist of 
> (correctly IMHO) a _complete_ 32-bit surrogate pair or (dain 
> bramagedly) of the individual 'halfs' of the pair (thus making a 
> single character into two individual 'units' of 16-bits each)? If 
> the latter, the Java's handling of Unicode is broken-as-designed and 
> must be fixed (most likely via deprecation of the existing String in 
> favor of a completely new string type for the sake of backwards 
> compatibility with already deployed apps).

A Java 'char' is a 16 bit data type, so it simply isn't possible for
it to directly represent a Unicode character. There's no way of
changing that, so treating them as representing UTF-16 units seems to
be the best option. So, yes, a Java char might represent one half of
a surrogate pair. Whether that's brain damaged or not is likely to
be point of view dependent. But it's not a broken representation of 
Unicode characters ... it _isn't_ a representation of Unicode
characters.

That does leave the String class dangling tho'. As it stands all its
method and constructor signatures are defined in terms of chars, and
those can't be changed without breaking almost all the Java code in
existence ... and for that same reason deprecation doesn't seem very
likely either. It might be possible to retrofit character oriented 
methods, but it'd probably be a better option to create a completely 
new class. There's nothing to stop anyone from doing that, tho' it
wouldn't hurt if interested parties took a proposal to the JCP.

Cheers,


Miles

-- 
Miles Sabin                                     InterX
Internet Systems Architect                      27 Great West Road
+44 (0)20 8817 4030                             Middx, TW8 9AS, UK
msabin@interx.com                               http://www.interx.com/