OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] XML Max Character Value

[ Lists Home | Date Index | Thread Index ]

* Michael Kay <mike@saxonica.com> [2005-08-14 17:24]:
> > -----Original Message-----
> > From: Alan Gutierrez [mailto:alan-xml-dev@engrm.com] 
> > Sent: 13 August 2005 12:06
> > To: Derek Denny-Brown
> > Cc: xml-dev@lists.xml.org
> > Subject: Re: [xml-dev] XML Max Character Value
> > 
> > * Derek Denny-Brown <derekdb@microsoft.com> [2005-08-13 01:29]:
> > 
> > > In java, 0xFFFE or 0xFFFF should work.  They aren't strictly
> > > the max Unicode character for XML, but since Java represents
> > > Unicode as utf-16 but doesn't really provide much support for
> > > surrogate pairs (last I checked), those should work.  Hm..
> > > Eclipse tells me that there is Character.MAX_VALUE.  Use at
> > > your own risk.
> >     
> >     I am using it to design the algorithm.  Concerned about what to
> >     do if Unicode requires multiple characters for a single
> >     character. It's perplexing.
> > 
> > > Reading up on Unicode is also recommended though...
> > > internationalization is far, far more complicated than you
> > > ever imagined.  I know people who get the shakes if you just
> > > mention "Turkish 'I'" in their presence.  (mild
> > > exaggeration...)
> > 
> >     I have no illusions about the complexity. I'd simply hoped that
> >     they would have made a hard and fast rule about min and 
> >     max values.

> In XSLT 2.0, the collation used by xsl:key is not necessarily
> Unicode codepoint order. To build an index, you need to store the
> key value as a sequence of collation units, not as a sequence of
> Java chars or Unicode codepoints. So I suspect that what you
> really want is the highest collation unit in the particular
> collation used for the key in question.

    I don't need a sentry at this point. I've turned the equality
    tests around so they start from an implicit zero.

    Thus, for the sake of <xsl:key/>... 

> (Actually, xsl:key only supports equality semantics, not ordering
>    semantics.  But I can see that you probably want to implement
>    indexes that also support ordering semantics. It's likely that
>    these too would need to be collation-sensitive.)  

    ...I'm only using the sort in order to search and to find the
    values in the tree. Any sort will do. Collation in <xsl:key/> is
    only applied after the keyed nodes are recovered, or that's my
    understanding.

    Soon after, I'm going to want to support ordering as well, and
    attempt to integrate that with <xsl:sort/>. (Perhaps, XQuery can
    take advantage of ordered indices, I don't know.)

    This is a B-Tree implementation. The words 'collation unit' are
    heartening, I'm looking to advance the string comparison myself,
    using it to determine which branch to take in the B-Tree.

    I'm storing partial strings in tiers for branching. Partial
    means, just enough of the string to indicate which branch to
    take. My design stores a character and index pair as a branch
    node, so I bump along the search string branching along the way.

    This is FYI, for the group...

    I've written a document object model that's file backed, and I'm
    using it with Saxon for queries, and I've put together my own
    XUpdate implementation for node surgery.
    
    I want to provide Saxon with a file backed index.

--
Alan Gutierrez - alan@engrm.com
    - http://engrm.com/blogometer/index.html
    - http://engrm.com/blogometer/rss.2.0.xml




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS