[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Java/Unicode brain damage
- From: Miles Sabin <msabin@interx.com>
- To: xml-dev@lists.xml.org
- Date: Wed, 25 Jul 2001 15:50:08 +0100
Elliotte Rusty Harold wrote,
> The Java way to handle this is to stop thinking of a Java char as
> representing a Unicode character. It doesn't. A Java char represents
> a UTF-16 code point, which may be a surrogate. The public API to
> java.lang.String is essentially a UTF-16 API. For example, the
> length() method of a string does not return the number of Unicode
> characters in the string. Rather it returns the number of UTF-16
> code points.
This is correct, but not yet officially documented in the Java
Language Specification. It got hammered out during the development
of the java.nio spec.
Cheers,
Miles
--
Miles Sabin InterX
Internet Systems Architect 27 Great West Road
+44 (0)20 8817 4030 Middx, TW8 9AS, UK
msabin@interx.com http://www.interx.com/