xml-dev - Re: [xml-dev] Genx

Re: [xml-dev] Genx

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Genx
From: Joe English <jenglish@flightlab.com>
Date: Wed, 21 Jan 2004 15:33:17 -0800
In-reply-to: <14F9ED9C-4C5B-11D8-905C-000A95A51C9E@textuality.com>
References: <14F9ED9C-4C5B-11D8-905C-000A95A51C9E@textuality.com> <C85FBC6B-4B14-11D8-905C-000A95A51C9E@textuality.com> <200401211840.i0LIeLo07627@dragon.office.flightlab.com> <20040121195737.GH30165@skunk.reutershealth.com> <E40F8B75-4C55-11D8-905C-000A95A51C9E@textuality.com> <20040121213708.GA4414@skunk.reutershealth.com>

Tim Bray wrote:

> jcowan wrote:
> > C and C++ on the Windows platform *are* UTF-16 centric.  If you put
> > a Gothic character into a "..."L string, for example
>
> So you're saying that it would be satisfactory for genx to infer that if
>
>     sizeof(wchar_t) == 2
>
> then the values are UTF16 coded units? -Tim

I'd say that depends on what degree of portability you're
after, and whether or not you use any of the wcs* or mb*
standard library routines.

If you want it to be strictly-conforming C, that's *not* a
safe assumption.  If OTOH you only need it to be portable to
a plurality of relatively modern, not-too-badly-braindamaged
systems, it's probably OK.

More specifically: if sizeof(wchar_t) == 2 and NBBY == 8,
then you can safely assume that a wchar_t can hold a UCS-16
code point.  You should *not* assume that the compiler and C
standard library will interpret them as such.

Nor should you assume that the compiler and C standard library
will interpret multibyte sequences as UTF-8 (many don't).

You should *definitely* not assume that wchar_t's are UTF16 coded
units: any implementation that does so is just plain wrong --
UTF-16 is a variable-width encoding (unless you restrict
it to the BMP, in which case it's the same as UCS-16).

--Joe English

  jenglish@flightlab.com

References:
- Re: [xml-dev] Genx
  - From: Tim Bray <tbray@textuality.com>
- Genx
  - From: Tim Bray <tbray@textuality.com>
- Re: [xml-dev] Genx
  - From: Joe English <jenglish@flightlab.com>
- Re: [xml-dev] Genx
  - From: jcowan@reutershealth.com
- Re: [xml-dev] Genx
  - From: Tim Bray <tbray@textuality.com>
- Re: [xml-dev] Genx
  - From: jcowan@reutershealth.com

Prev by Date: Re: [xml-dev] Genx
Next by Date: RE: [xml-dev] Genx
Previous by thread: Re: [xml-dev] Genx
Next by thread: Re: [xml-dev] Genx
Index(es):
- Date
- Thread