[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Java/Unicode brain damage
- From: James Clark <email@example.com>
- To: Elliotte Rusty Harold <firstname.lastname@example.org>, email@example.com
- Date: Thu, 26 Jul 2001 13:33:01 +0700
> The Java way to handle this is to stop thinking of a Java char as
> representing a Unicode character. It doesn't. A Java char represents a
> UTF-16 code point, which may be a surrogate. The public API to
> java.lang.String is essentially a UTF-16 API. For example, the length()
> method of a string does not return the number of Unicode characters in
> the string. Rather it returns the number of UTF-16 code points. A string
> containing a single Plane-1 character has length 2 in Java.
I agree with this analysis.
Much more problematic than java.lang.String is java.lang.Character. An API
doesn't work too well once you have letters outside the BMP. The JDK ought
to have a class representing a Unicode character (scalar value).
Unfortunately .NET (System.Char) has the same problem.