OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Genx

[ Lists Home | Date Index | Thread Index ]


Tim Bray wrote:

> jcowan wrote:
> > C and C++ on the Windows platform *are* UTF-16 centric.  If you put
> > a Gothic character into a "..."L string, for example
>
> So you're saying that it would be satisfactory for genx to infer that if
>
>     sizeof(wchar_t) == 2
>
> then the values are UTF16 coded units? -Tim


I'd say that depends on what degree of portability you're
after, and whether or not you use any of the wcs* or mb*
standard library routines.

If you want it to be strictly-conforming C, that's *not* a
safe assumption.  If OTOH you only need it to be portable to
a plurality of relatively modern, not-too-badly-braindamaged
systems, it's probably OK.

More specifically: if sizeof(wchar_t) == 2 and NBBY == 8,
then you can safely assume that a wchar_t can hold a UCS-16
code point.  You should *not* assume that the compiler and C
standard library will interpret them as such.

Nor should you assume that the compiler and C standard library
will interpret multibyte sequences as UTF-8 (many don't).

You should *definitely* not assume that wchar_t's are UTF16 coded
units: any implementation that does so is just plain wrong --
UTF-16 is a variable-width encoding (unless you restrict
it to the BMP, in which case it's the same as UCS-16).


--Joe English

  jenglish@flightlab.com




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS