OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: unicode confusion

[ Lists Home | Date Index | Thread Index ]
  • From: Tim Bray <tbray@textuality.com>
  • To: "Fabio Arciniegas A." <l-arcini@uniandes.edu.co>, xml-dev@ic.ac.uk
  • Date: Tue, 04 Jan 2000 11:07:24 -0800

At 01:37 PM 1/4/00 -0500, Fabio Arciniegas A. wrote:
>> Note that Java uses UTF-16, which isn't quite fixed-width, though no
>> one really notices.
>Err... David, I thought Java used UTF-8, actually a version slightly
>different from the "typical" version that expresses:

Java has come with a succession of library classes that advertised
UTF-8 support; the first few iterations were so hopelessly broken that I 
gave up on them, but I've been told that recent versions are verging
on usable.

What David was saying is that in Java, the basic "char" data type
is 16 bits, and thus is naturally used to hold UTF-16-encoded text.  I
have no idea if the library classes do the right things with UTF-16
surrogate pairs either in String or char[] contexts, but my experience 
with String processing in Java is that it's often best just to ignore
those libraries anyhow and roll your own. -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS