Lists Home |
Date Index |
* 'Alan Gutierrez' <email@example.com> [2005-08-14 23:12]:
> * Michael Kay <firstname.lastname@example.org> [2005-08-14 17:24]:
> > > -----Original Message-----
> > > From: Alan Gutierrez [mailto:email@example.com]
> > > Sent: 13 August 2005 12:06
> > > To: Derek Denny-Brown
> > > Cc: firstname.lastname@example.org
> > > Subject: Re: [xml-dev] XML Max Character Value
> > >
> > > * Derek Denny-Brown <email@example.com> [2005-08-13 01:29]:
> > >
> > > > In java, 0xFFFE or 0xFFFF should work. They aren't strictly
> > > > the max Unicode character for XML, but since Java represents
> > > > Unicode as utf-16 but doesn't really provide much support for
> > > > surrogate pairs (last I checked), those should work. Hm..
> > > > Eclipse tells me that there is Character.MAX_VALUE. Use at
> > > > your own risk.
> > >
> > > I am using it to design the algorithm. Concerned about what to
> > > do if Unicode requires multiple characters for a single
> > > character. It's perplexing.
> > >
> > > > Reading up on Unicode is also recommended though...
> > > > internationalization is far, far more complicated than you
> > > > ever imagined. I know people who get the shakes if you just
> > > > mention "Turkish 'I'" in their presence. (mild
> > > > exaggeration...)
> > >
> > > I have no illusions about the complexity. I'd simply hoped that
> > > they would have made a hard and fast rule about min and
> > > max values.
> > In XSLT 2.0, the collation used by xsl:key is not necessarily
> > Unicode codepoint order. To build an index, you need to store the
> > key value as a sequence of collation units, not as a sequence of
> > Java chars or Unicode codepoints. So I suspect that what you
> > really want is the highest collation unit in the particular
> > collation used for the key in question.
> This is a B-Tree implementation. The words 'collation unit' are
> heartening, I'm looking to advance the string comparison myself,
> using it to determine which branch to take in the B-Tree.
> I'm storing partial strings in tiers for branching. Partial
> means, just enough of the string to indicate which branch to
> take. My design stores a character and index pair as a branch
> node, so I bump along the search string branching along the way.
I've found CollationKey.toByteArray() in java.text.
It seems to do what I need. Create a sequence of units along
which I can advance and compare.
Alan Gutierrez - firstname.lastname@example.org