[
Lists Home |
Date Index |
Thread Index
]
* Tom Moog <tmoog@sarvega.com> [2005-08-13 21:53]:
> On Aug 13 07:19, Alan Gutierrez <alan-xml-dev@engrm.com> wrote:
> >
> > Subject: Re: [xml-dev] XML Max Character Value
> >
> > * Bob Foster <bob@objfac.com> [2005-08-13 02:55]:
> >
> > > Alan Gutierrez wrote:
> >
> > > > I'm implementing B-Tree to index XML documents. I'd like a
> > > > to use maximum character value as a boundry, or failing that a
> > > > minimum character value.
> >
> > > I believe the current Unicode character range, and the one that was
> > > effective for the XML 1.0 standard, is 0x20-0x10000 (note 17 bits) plus
> > > the control characters, '\t' and '\n' and minus the surrogate pair range
> > > and 0xFFFF and 0xFFFE.
> The maximum for xml is 0x10ffff.
> You may want to think in terms of utf-8 encoding.
> One characteristic of utf-8 is that it preserves the order of
> strings. In other words, if code(A) < code(B), then utf-8(A)
> utf-8(B) when compared as a sequence of unsigned 8 bit bytes.
That sounds good. For text data like XSLT dates, '2005-08-10',
where locale and colation might not matter, I'll want to use the
simplest, smallest representation possible. Maybe not the best
example, since there is binary representation.
In any case...
I've reworked my algorithm so that it starts from a head node
that is an implicit least value node. The conditionals only
apply to subsequent nodes, which are built from inserted values.
Thus, I've removed the need for a sentinal. I'll only ever be
testing against characters found within the XML document.
Thank you everyone who responded, I'm sure I'm going want to ask
more questions later about collation.
--
Alan Gutierrez - alan@engrm.com
- http://engrm.com/blogometer/index.html
- http://engrm.com/blogometer/rss.2.0.xml
|