Lists Home |
Date Index |
Bob Foster wrote:
> I'm puzzled. What is the "aha moment" here? Your point seems to be that Java
> char != Unicode character. True. Exactly like UTF-8 octet != Unicode
> character. The fact that half a surrogate pair is not a Unicode character
> doesn't seem like breaking news.
The 'aha' moment was the point that it's safer to use strings rather
than characters as the primitives of your API, because what to a human
may look like a single character may be a composition of several unicode
characters, which looks like a string to the program.
> Do you mean to say that use of UTF-16 character encoding in a programming
> language is broken as designed? In the perfect language of your own design,
> would you have the "char" type be 32 bits? Is that what this is all about?
I'm in the middle of a series of essays on this over at 'ongoing'
Cheers, Tim Bray
(ongoing fragmented essay: http://www.tbray.org/ongoing/)