xml-dev - Re: [xml-dev] XML Max Character Value

Re: [xml-dev] XML Max Character Value

[ Lists Home | Date Index | Thread Index ]

To: Alan Gutierrez <alan-xml-dev@engrm.com>
Subject: Re: [xml-dev] XML Max Character Value
From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 14 Aug 2005 13:15:13 +0300
Cc: XML Developers List <xml-dev@lists.xml.org>
In-reply-to: <20050813111909.GD4299@maribor.izzy.net>
References: <075E1759251CCB49ABF05D8F742AFE270695FDF6@RED-MSG-50.redmond.corp.microsoft.com> <42FD9944.8010209@objfac.com> <20050813111909.GD4299@maribor.izzy.net>

On Aug 13, 2005, at 14:19, Alan Gutierrez wrote:

>     Am I seeing that with Unicode in Java, you need to work with
>     String and not with individual char? That puts a dent in my
>     algorithm, which advanced along the characters in the string.

It depends on what exactly you are doing. A Java char is not a Unicode 
character but a UTF-16 code unit. The values \u0000 and \uFFFF should 
never occur in XML and can be used as sentinels if your algorithm works 
on UTF-16 code units. For the purpose of indexing text, working on 
UTF-16 code units as opposed to working on Unicode characters may well 
be good enough. In that case, a surrogate pair can be treated as two 
adjacent "characters". (Note that even when operating on UTF-32, you 
can have tightly-coupled characters when there is a base character 
followed by combining marks, so working on Unicode characters does not 
buy you inter-character independence.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Follow-Ups:
- Re: [xml-dev] XML Max Character Value
  - From: Alan Gutierrez <alan-xml-dev@engrm.com>

References:
- RE: [xml-dev] XML Max Character Value
  - From: "Derek Denny-Brown" <derekdb@microsoft.com>
- Re: [xml-dev] XML Max Character Value
  - From: Bob Foster <bob@objfac.com>
- Re: [xml-dev] XML Max Character Value
  - From: Alan Gutierrez <alan-xml-dev@engrm.com>

Prev by Date: Canadian Semantic Web Working Symposium (CSWWS 2006)
Next by Date: Re: [xml-dev] XML Max Character Value
Previous by thread: Re: [xml-dev] XML Max Character Value
Next by thread: Re: [xml-dev] XML Max Character Value
Index(es):
- Date
- Thread