[
Lists Home |
Date Index |
Thread Index
]
Joe English scripsit:
> The 'codePoint' typedef may be problematic:
>
> // Unicode code points (4-byte int on most systems)
> typedef wchar_t codePoint;
>
> The C standard makes no useful guarantees about
> the size or interpretation of 'wchar_t'. On some
> systems it's identical to plain 'char', and even
> on systems where it's big enough to hold all of
> Unicode, there's no guarantee about what encoding
> the wcs* and *wcs functions use. wchar_t should
> not be used in programs that are meant to generate
> portable data and be portable themselves; you just
> don't know what you're going to get.
I have argued privately that wchar_t is in fact the Right Thing here
despite its variability in size (UTF-32 on Unix platforms, UTF-16 on
Windows), because it makes genx compatible with both standardized and
non-standardized facilities, most especially "..."L strings. Some
conditional logic will be needed to interpret the input as UTF-16 or
UTF-32, which can be based on sizeof(wchar_t). Hypothetical platforms
where sizeof(wchar_t) == 1 can be neglected.
--
He made the Legislature meet at one-horse John Cowan
tank-towns out in the alfalfa belt, so that jcowan@reutershealth.com
hardly nobody could get there and most of http://www.reutershealth.com
the leaders would stay home and let him go http://www.ccil.org/~cowan
to work and do things as he pleased. --Mencken, _Declaration of Independence_
|