OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XML Max Character Value

[ Lists Home | Date Index | Thread Index ]

* Bob Foster <bob@objfac.com> [2005-08-13 02:55]:

> Alan Gutierrez wrote:

> >     I'm implementing B-Tree to index XML documents. I'd like a
> >     to use maximum character value as a boundry, or failing that a
> >     minimum character value.

> I believe the current Unicode character range, and the one that was 
> effective for the XML 1.0 standard, is 0x20-0x10000 (note 17 bits) plus 
> the control characters, '\t' and '\n' and minus the surrogate pair range 
> and 0xFFFF and 0xFFFE. The fact that Java doesn't have much support for 
> the surrogate pairs, which are the only way to express character values 
> greater than 0xFFFF, doesn't mean they won't appear in XML documents.

    It gives me something to Google about, "surrogate pairs". I see
    Jaxen has some code to convert them. 

    Am I seeing that with Unicode in Java, you need to work with
    String and not with individual char? That puts a dent in my
    algorithm, which advanced along the characters in the string.

> So the answer is, no there's no single 16-bit maximum character value. 
> The test requires access to at least the next character and a little code.

    Is zero the absolute minimum? If so I could build reverse indices.

    Thanks for your help, Bob, Derek, and Robert. I'm not getting
    any feedback at comp.lang.java.programmer.

Alan Gutierrez - alan@engrm.com
    - http://engrm.com/blogometer/index.html
    - http://engrm.com/blogometer/rss.2.0.xml


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS