[
Lists Home |
Date Index |
Thread Index
]
At 1:29 PM -0500 1/10/02, Simon St.Laurent wrote:
>Surrogate pairs are very tricky critters that seem to me to require
>substantially more programming care than any other aspect of
>Unicode, and I suspect that developers will be cursing them for a
>long time to come.
>
You're only having trouble because Java's char type is brain-damaged
in that a Java char actually represents a UTF-16 code point rather
than a Unicode character. If Java's char type were four bytes instead
of two, or an object instead of a primitive type, none of this would
be bothering you. Surrogate pairs are one of the things a good class
library should hide from you.
It could be worse, though. You could be using C, and trying to decode
UTF-8. :-)
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible, 2nd Edition (Hungry Minds, 2001) |
| http://www.ibiblio.org/xml/books/bible2/ |
| http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+----------------------------------+---------------------------------+
|