OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Java/Unicode brain damage



Miles,

> A Java 'char' is a 16 bit data type, so it simply isn't possible for
> it to directly represent a Unicode character. 

Could you elaborate?  There's a section in my Unicode book
(in another city :) that talks about surrogates.  There's a sense
in which "if it's listed there, it's a kind of character".

The word "character" is heavily overloaded, but I think it's
clear that in at least one sense a Java "char" _is_ what folk
call a "character".  That's just how the word is used, even
if it's arguably sloppy usage for other contexts.

It would likely be instructive to have someone explain
the senses in which "char" is, and isn't, a character.

Likewise the senses in which combining marks relate
to the concept of a character ... "character" is actually
a rather complex notion, and ISO-10646 code points
are (as I understand) not necessarily going to be able
to represent a "character" either (32 bits v. 16).

- Dave