[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Java/Unicode brain damage
- From: David Brownell <david-b@pacbell.net>
- To: Elliotte Rusty Harold <elharo@metalab.unc.edu>, xml-dev@lists.xml.org
- Date: Thu, 26 Jul 2001 08:45:49 -0700
> Furthermore, I think Java is broken enough here that Java needs to change.
> I don't think XML should be limited by this brain damage in Java.
I think "broken" is seriously overstating things. It's not a real issue "now",
and in fact if you accept that UTF-16 is native, the issue is just a lack of
support for a missing feature that's got workarounds.
Anyone who really needs such support can code it themselves; it's clear
things could be better, but there's no fatal problem. Just a need for an
overdue (!) update to the standard Java library.
> One silver
> lining to the Blueberry cloud might be that it could convince Sun to use a
> four-byte char like they should have back in 1995.
Nah, people complain enough about wasted space ... admittedly
there's a religous war on whether (in C terms) "wchar_t" should
be 16 bits or 32.
But I did expect Sun would have addressed the issue of variable
length characters in Java by now. The paper David Jackson
pointed to (http://www.unicode.org/iuc/iuc16/b17/paper.pdf) is
from last year, but the issues weren't new then. Variable length
characters show up in the case of combining marks there, not just
with surrogate pairs, and a 32-bit wchar_t won't help with the
case of combining marks: fat wchar_t isn't sufficient.
- Dave