[
Lists Home |
Date Index |
Thread Index
]
* Michael Kay <mike@saxonica.com> [2005-08-14 17:24]:
> > -----Original Message-----
> > From: Alan Gutierrez [mailto:alan-xml-dev@engrm.com]
> > Sent: 13 August 2005 12:06
> > To: Derek Denny-Brown
> > Cc: xml-dev@lists.xml.org
> > Subject: Re: [xml-dev] XML Max Character Value
> >
> > * Derek Denny-Brown <derekdb@microsoft.com> [2005-08-13 01:29]:
> >
> > > In java, 0xFFFE or 0xFFFF should work. They aren't strictly
> > > the max Unicode character for XML, but since Java represents
> > > Unicode as utf-16 but doesn't really provide much support for
> > > surrogate pairs (last I checked), those should work. Hm..
> > > Eclipse tells me that there is Character.MAX_VALUE. Use at
> > > your own risk.
> >
> > I am using it to design the algorithm. Concerned about what to
> > do if Unicode requires multiple characters for a single
> > character. It's perplexing.
> >
> > > Reading up on Unicode is also recommended though...
> > > internationalization is far, far more complicated than you
> > > ever imagined. I know people who get the shakes if you just
> > > mention "Turkish 'I'" in their presence. (mild
> > > exaggeration...)
> >
> > I have no illusions about the complexity. I'd simply hoped that
> > they would have made a hard and fast rule about min and
> > max values.
> In XSLT 2.0, the collation used by xsl:key is not necessarily
> Unicode codepoint order. To build an index, you need to store the
> key value as a sequence of collation units, not as a sequence of
> Java chars or Unicode codepoints. So I suspect that what you
> really want is the highest collation unit in the particular
> collation used for the key in question.
I don't need a sentry at this point. I've turned the equality
tests around so they start from an implicit zero.
Thus, for the sake of <xsl:key/>...
> (Actually, xsl:key only supports equality semantics, not ordering
> semantics. But I can see that you probably want to implement
> indexes that also support ordering semantics. It's likely that
> these too would need to be collation-sensitive.)
...I'm only using the sort in order to search and to find the
values in the tree. Any sort will do. Collation in <xsl:key/> is
only applied after the keyed nodes are recovered, or that's my
understanding.
Soon after, I'm going to want to support ordering as well, and
attempt to integrate that with <xsl:sort/>. (Perhaps, XQuery can
take advantage of ordered indices, I don't know.)
This is a B-Tree implementation. The words 'collation unit' are
heartening, I'm looking to advance the string comparison myself,
using it to determine which branch to take in the B-Tree.
I'm storing partial strings in tiers for branching. Partial
means, just enough of the string to indicate which branch to
take. My design stores a character and index pair as a branch
node, so I bump along the search string branching along the way.
This is FYI, for the group...
I've written a document object model that's file backed, and I'm
using it with Saxon for queries, and I've put together my own
XUpdate implementation for node surgery.
I want to provide Saxon with a file backed index.
--
Alan Gutierrez - alan@engrm.com
- http://engrm.com/blogometer/index.html
- http://engrm.com/blogometer/rss.2.0.xml
|