[
Lists Home |
Date Index |
Thread Index
]
Tim Bray wrote:
> jcowan wrote:
> > C and C++ on the Windows platform *are* UTF-16 centric. If you put
> > a Gothic character into a "..."L string, for example
>
> So you're saying that it would be satisfactory for genx to infer that if
>
> sizeof(wchar_t) == 2
>
> then the values are UTF16 coded units? -Tim
I'd say that depends on what degree of portability you're
after, and whether or not you use any of the wcs* or mb*
standard library routines.
If you want it to be strictly-conforming C, that's *not* a
safe assumption. If OTOH you only need it to be portable to
a plurality of relatively modern, not-too-badly-braindamaged
systems, it's probably OK.
More specifically: if sizeof(wchar_t) == 2 and NBBY == 8,
then you can safely assume that a wchar_t can hold a UCS-16
code point. You should *not* assume that the compiler and C
standard library will interpret them as such.
Nor should you assume that the compiler and C standard library
will interpret multibyte sequences as UTF-8 (many don't).
You should *definitely* not assume that wchar_t's are UTF16 coded
units: any implementation that does so is just plain wrong --
UTF-16 is a variable-width encoding (unless you restrict
it to the BMP, in which case it's the same as UCS-16).
--Joe English
jenglish@flightlab.com
|