OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Java/Unicode brain damage



> Furthermore, I think Java is broken enough here that Java needs to change.
> I don't think XML should be limited by this brain damage in Java.

I think "broken" is seriously overstating things.  It's not a real issue "now",
and in fact if you accept that UTF-16 is native, the issue is just a lack of
support for a missing feature that's got workarounds.

Anyone who really needs such support can code it themselves; it's clear
things could be better, but there's no fatal problem.  Just a need for an
overdue (!) update to the standard Java library.


>     One silver
> lining to the Blueberry cloud might be that it could convince Sun to use a
> four-byte char like they should have back in 1995. 

Nah, people complain enough about wasted space ... admittedly
there's a religous war on whether (in C terms) "wchar_t" should
be 16 bits or 32.

But I did expect Sun would have addressed the issue of variable
length characters in Java by now.  The paper David Jackson
pointed to (http://www.unicode.org/iuc/iuc16/b17/paper.pdf) is
from last year, but the issues weren't new then.  Variable length
characters show up in the case of combining marks there, not just
with surrogate pairs, and a 32-bit wchar_t won't help with the
case of combining marks:  fat wchar_t isn't sufficient.

- Dave