[
Lists Home |
Date Index |
Thread Index
]
* 'Alan Gutierrez' <alan-xml-dev@engrm.com> [2005-08-14 23:12]:
> * Michael Kay <mike@saxonica.com> [2005-08-14 17:24]:
> > > -----Original Message-----
> > > From: Alan Gutierrez [mailto:alan-xml-dev@engrm.com]
> > > Sent: 13 August 2005 12:06
> > > To: Derek Denny-Brown
> > > Cc: xml-dev@lists.xml.org
> > > Subject: Re: [xml-dev] XML Max Character Value
> > >
> > > * Derek Denny-Brown <derekdb@microsoft.com> [2005-08-13 01:29]:
> > >
> > > > In java, 0xFFFE or 0xFFFF should work. They aren't strictly
> > > > the max Unicode character for XML, but since Java represents
> > > > Unicode as utf-16 but doesn't really provide much support for
> > > > surrogate pairs (last I checked), those should work. Hm..
> > > > Eclipse tells me that there is Character.MAX_VALUE. Use at
> > > > your own risk.
> > >
> > > I am using it to design the algorithm. Concerned about what to
> > > do if Unicode requires multiple characters for a single
> > > character. It's perplexing.
> > >
> > > > Reading up on Unicode is also recommended though...
> > > > internationalization is far, far more complicated than you
> > > > ever imagined. I know people who get the shakes if you just
> > > > mention "Turkish 'I'" in their presence. (mild
> > > > exaggeration...)
> > >
> > > I have no illusions about the complexity. I'd simply hoped that
> > > they would have made a hard and fast rule about min and
> > > max values.
>
> > In XSLT 2.0, the collation used by xsl:key is not necessarily
> > Unicode codepoint order. To build an index, you need to store the
> > key value as a sequence of collation units, not as a sequence of
> > Java chars or Unicode codepoints. So I suspect that what you
> > really want is the highest collation unit in the particular
> > collation used for the key in question.
> This is a B-Tree implementation. The words 'collation unit' are
> heartening, I'm looking to advance the string comparison myself,
> using it to determine which branch to take in the B-Tree.
>
> I'm storing partial strings in tiers for branching. Partial
> means, just enough of the string to indicate which branch to
> take. My design stores a character and index pair as a branch
> node, so I bump along the search string branching along the way.
>
I've found CollationKey.toByteArray() in java.text.
http://java.sun.com/j2se/1.4.2/docs/api/java/text/CollationKey.html#toByteArray()
It seems to do what I need. Create a sequence of units along
which I can advance and compare.
--
Alan Gutierrez - alan@engrm.com
- http://engrm.com/blogometer/index.html
- http://engrm.com/blogometer/rss.2.0.xml
|