[
Lists Home |
Date Index |
Thread Index
]
Tim Bray scripsit:
> Almost. How about we leave it as wchar_t, but *not* UTF-16, so a value
> that's in a surrogate block is an error. Then we change the name from
> codePoint (which could be interpreted as meaning "UTF-16 Code Point" to
> something more explicit like
>
> numericValueCorrespondingToAUnicodeCharacterAsInUPlusFourHexDigitsIsThat
> Clear
>
> John Cowan has suggested that "codeUnit" might be a good name, I'd be
> inclined to "uniChar", any other ideas?
I must have unintentionally misled you. A "code point" is an integer
in the range 0-0x10FFFF; Unicode maps characters to code points. "Code
units" are chunks o' bits: UTF-8, UTF-16, and UTF-32 map code points to
8-bit code units, 16-bit code units, and 32-bit code units respectively.
"UTF-16 code point" is a contradiction in terms.
However, on reflection I think that the Right Thing is to use
wchar_t directly in the API, since the whole point of using it is for
compatibility with other wchar_t-aware routines, either standardized
or platform-specific. There is no point in hiding it behind a type name.
(As I said, if your platform has 8-bit wchar_t's, you deserve to lose.)
> If someone wants to put a generic UTF-16 processor on top of genx, that
> would be fine. I don't see the demand for supporting it at the input
> end of genx because the UTF-16 centric languages like Java and C# have
> decent xml-writing software already. -Tim
C and C++ on the Windows platform *are* UTF-16 centric. If you put
a Gothic character into a "..."L string, for example, it will produce
a string which is three wchar_t's long on Windows, whereas on Unix it
will be two wchar_t long (including the trailing 0 in both cases). As I
said, the additional code for converting UTF-16 (as opposed to UTF-32)
into UTF-8 is very small, and can be conditionalized on sizeof(wchar_t).
--
As you read this, I don't want you to feel John Cowan
sorry for me, because, I believe everyone jcowan@reutershealth.com
will die someday. -- From a Nigerian-type http://www.reutershealth.com
scam spam I got http://www.ccil.org/~cowan
|