OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XML 1.1 grinds to halt?

[ Lists Home | Date Index | Thread Index ]

From: "Tim Bray" <tbray@textuality.com>
> The point about doing strings, not characters, is well-taken, and one of
> the things in the W3C i18n draft that gave me an "aha" moment.  On the
> other hand, I think that when I say a "Unicode character", that has a
> very well-defined semantic, and COMBINING UMLAUT is one while codepoints
> from the surrogate blocks aren't, and any API that doesn't make that
> clear is, well, wrong.  Put another way, something that is a Unicode
> character in UTF-16 should also be a character in UTF-8 and UTF-32,
> which the surrogates aren't, so they are just not characters in any
> meaningful sense of the word.

I'm puzzled. What is the "aha moment" here? Your point seems to be that Java
char != Unicode character. True. Exactly like UTF-8 octet != Unicode
character. The fact that half a surrogate pair is not a Unicode character
doesn't seem like breaking news.

Do you mean to say that use of UTF-16 character encoding in a programming
language is broken as designed? In the perfect language of your own design,
would you have the "char" type be 32 bits? Is that what this is all about?



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS