[
Lists Home |
Date Index |
Thread Index
]
* Bob Foster <bob@objfac.com> [2005-08-13 02:55]:
> Alan Gutierrez wrote:
> > I'm implementing B-Tree to index XML documents. I'd like a
> > to use maximum character value as a boundry, or failing that a
> > minimum character value.
> I believe the current Unicode character range, and the one that was
> effective for the XML 1.0 standard, is 0x20-0x10000 (note 17 bits) plus
> the control characters, '\t' and '\n' and minus the surrogate pair range
> and 0xFFFF and 0xFFFE. The fact that Java doesn't have much support for
> the surrogate pairs, which are the only way to express character values
> greater than 0xFFFF, doesn't mean they won't appear in XML documents.
It gives me something to Google about, "surrogate pairs". I see
Jaxen has some code to convert them.
Am I seeing that with Unicode in Java, you need to work with
String and not with individual char? That puts a dent in my
algorithm, which advanced along the characters in the string.
> So the answer is, no there's no single 16-bit maximum character value.
> The test requires access to at least the next character and a little code.
Is zero the absolute minimum? If so I could build reverse indices.
Thanks for your help, Bob, Derek, and Robert. I'm not getting
any feedback at comp.lang.java.programmer.
--
Alan Gutierrez - alan@engrm.com
- http://engrm.com/blogometer/index.html
- http://engrm.com/blogometer/rss.2.0.xml
|