OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] XML Max Character Value

[ Lists Home | Date Index | Thread Index ]

* 'Alan Gutierrez' <alan-xml-dev@engrm.com> [2005-08-14 23:12]:
> * Michael Kay <mike@saxonica.com> [2005-08-14 17:24]:
> > > -----Original Message-----
> > > From: Alan Gutierrez [mailto:alan-xml-dev@engrm.com] 
> > > Sent: 13 August 2005 12:06
> > > To: Derek Denny-Brown
> > > Cc: xml-dev@lists.xml.org
> > > Subject: Re: [xml-dev] XML Max Character Value
> > > 
> > > * Derek Denny-Brown <derekdb@microsoft.com> [2005-08-13 01:29]:
> > > 
> > > > In java, 0xFFFE or 0xFFFF should work.  They aren't strictly
> > > > the max Unicode character for XML, but since Java represents
> > > > Unicode as utf-16 but doesn't really provide much support for
> > > > surrogate pairs (last I checked), those should work.  Hm..
> > > > Eclipse tells me that there is Character.MAX_VALUE.  Use at
> > > > your own risk.
> > >     
> > >     I am using it to design the algorithm.  Concerned about what to
> > >     do if Unicode requires multiple characters for a single
> > >     character. It's perplexing.
> > > 
> > > > Reading up on Unicode is also recommended though...
> > > > internationalization is far, far more complicated than you
> > > > ever imagined.  I know people who get the shakes if you just
> > > > mention "Turkish 'I'" in their presence.  (mild
> > > > exaggeration...)
> > > 
> > >     I have no illusions about the complexity. I'd simply hoped that
> > >     they would have made a hard and fast rule about min and 
> > >     max values.
> 
> > In XSLT 2.0, the collation used by xsl:key is not necessarily
> > Unicode codepoint order. To build an index, you need to store the
> > key value as a sequence of collation units, not as a sequence of
> > Java chars or Unicode codepoints. So I suspect that what you
> > really want is the highest collation unit in the particular
> > collation used for the key in question.

>     This is a B-Tree implementation. The words 'collation unit' are
>     heartening, I'm looking to advance the string comparison myself,
>     using it to determine which branch to take in the B-Tree.
> 
>     I'm storing partial strings in tiers for branching. Partial
>     means, just enough of the string to indicate which branch to
>     take. My design stores a character and index pair as a branch
>     node, so I bump along the search string branching along the way.
> 

    I've found CollationKey.toByteArray() in java.text.

    http://java.sun.com/j2se/1.4.2/docs/api/java/text/CollationKey.html#toByteArray()

    It seems to do what I need. Create a sequence of units along
    which I can advance and compare.

--
Alan Gutierrez - alan@engrm.com
    - http://engrm.com/blogometer/index.html
    - http://engrm.com/blogometer/rss.2.0.xml




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS