OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Java/Unicode brain damage

Joel Rees wrote:

> The char in C is a byte, and most C libraries assume strings are built of
> bytes, so C tends to use variable width characters.(Read that as UTF-8 for
> Unicode.) You can't back up safely with shift-JIS, so you sometimes dump
> things temporarily to fixed-width buffers when you need random access.
> Although you can back up safely with UTF-8, it's still sometimes convenient
> to temporarily dump a UTF-8 string to a constant width buffer. Since these
> buffers are rather local in nature (can't be worked on by most of the
> standard libraries at this time), widening them to 32 bits when 16 bits had
> been used does not usually cause any ripples.


Do you know if the C++ STL operates in a similar fashion?  It is usually 
  a pain to write portable C and C++ programs supporting
UTF-16. After the last Unicode conference,  I saw papers suggesting a 
language extension to support portable programs using UTF 16 with a 
C/C++ language extention.

One of the main problem talked about was pertaining to literal strings. 
  While is is aparently not rocket science to compose portable C and C++ 
programs using a fixed 16-bit (unsigned) integral data type as the 
character, it often means that you cannot use literal strings or the 
runtime libraries.

C'est la vie...

Duane Nickull