[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Java/Unicode brain damage
- From: Duane Nickull <duane@xmlglobal.com>
- To: Joel Rees <rees@mediafusion.co.jp>
- Date: Thu, 26 Jul 2001 06:25:37 -0700
Joel Rees wrote:
> The char in C is a byte, and most C libraries assume strings are built of
> bytes, so C tends to use variable width characters.(Read that as UTF-8 for
> Unicode.) You can't back up safely with shift-JIS, so you sometimes dump
> things temporarily to fixed-width buffers when you need random access.
> Although you can back up safely with UTF-8, it's still sometimes convenient
> to temporarily dump a UTF-8 string to a constant width buffer. Since these
> buffers are rather local in nature (can't be worked on by most of the
> standard libraries at this time), widening them to 32 bits when 16 bits had
> been used does not usually cause any ripples.
>>>>>>>>>>
Do you know if the C++ STL operates in a similar fashion? It is usually
a pain to write portable C and C++ programs supporting
UTF-16. After the last Unicode conference, I saw papers suggesting a
language extension to support portable programs using UTF 16 with a
C/C++ language extention.
One of the main problem talked about was pertaining to literal strings.
While is is aparently not rocket science to compose portable C and C++
programs using a fixed 16-bit (unsigned) integral data type as the
character, it often means that you cannot use literal strings or the
platform's
runtime libraries.
C'est la vie...
Duane Nickull