OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Unicode confusion

[ Lists Home | Date Index | Thread Index ]
  • From: David Brownell <david-b@pacbell.net>
  • To: xml-dev@ic.ac.uk
  • Date: Tue, 04 Jan 2000 08:56:35 -0800

David Megginson wrote:
> roddey@us.ibm.com writes:
> > If anything, it should go the other way. Unicode should be the core
> > API, and there should be helper API to allow the use of local code
> > page chars where necessary. Everything should be set up to optimize
> > use of the Unicode API, with local code page use paying the price,
> > since Unicode is the more desireable format.

I took that as referring to 16-bit character codes vs variable width
or 32-bit ones.  And when I take it that way, I agree!  (However, the
notion of a "Unicode API" struck me as strange; the spec has no API.)

> No one's disagreeing with the use of Unicode; we're talking about
> which character encoding we'll use to represent it.  You can represent
> Unicode in variable-width 8-bit or 16-bit encodings or in fixed-width
> 32-bit encodings.
> Note that Java uses UTF-16, which isn't quite fixed-width, though no
> one really notices.

... no one really notices "yet"!  Unicode is still rolling out, in the
big picture, and most people now using it have little reason to notice.

One way that UTF-16 (and Unicode) aren't fixed width is that there
can exist "surrogate pairs", where two 16-bit values get combined to
represent a character in a range that can't be represented by 16-bits.
(For those that didn't know that!)   It's the existence of such pairs
which makes some folk argue that a 32-bit character code is the way to
go (and they persuaded most SysV UNIX platforms to put a 32-bit wchar_t
in their ABI, accordingly).

However, another way they aren't fixed width is that "combining"
characters get used.  Things like diacritical marks aren't always
part of the characters.  In my book, the additional existence of
such features means there's no point in a 32-bit character code,
since even apps using a full ISO-10646 encoding (32-bit) still need
to deal with such issues.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS