OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XML Max Character Value

[ Lists Home | Date Index | Thread Index ]

On Aug 13, 2005, at 14:19, Alan Gutierrez wrote:

>     Am I seeing that with Unicode in Java, you need to work with
>     String and not with individual char? That puts a dent in my
>     algorithm, which advanced along the characters in the string.

It depends on what exactly you are doing. A Java char is not a Unicode 
character but a UTF-16 code unit. The values \u0000 and \uFFFF should 
never occur in XML and can be used as sentinels if your algorithm works 
on UTF-16 code units. For the purpose of indexing text, working on 
UTF-16 code units as opposed to working on Unicode characters may well 
be good enough. In that case, a surrogate pair can be treated as two 
adjacent "characters". (Note that even when operating on UTF-32, you 
can have tightly-coupled characters when there is a base character 
followed by combining marks, so working on Unicode characters does not 
buy you inter-character independence.)

Henri Sivonen


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS