xml-dev - Re: [xml-dev] XML Max Character Value

Re: [xml-dev] XML Max Character Value

[ Lists Home | Date Index | Thread Index ]

To: Tom Moog <tmoog@sarvega.com>
Subject: Re: [xml-dev] XML Max Character Value
From: Alan Gutierrez <alan-xml-dev@engrm.com>
Date: Sun, 14 Aug 2005 06:36:07 -0400
Cc: Bob Foster <bob@objfac.com>, Derek Denny-Brown <derekdb@microsoft.com>, xml-dev@lists.xml.org
In-reply-to: <1123984419.52112@sarvega.com>
Mail-followup-to: Tom Moog <tmoog@sarvega.com>, Bob Foster <bob@objfac.com>,Derek Denny-Brown <derekdb@microsoft.com>, xml-dev@lists.xml.org
References: <20050813111909.GD4299@maribor.izzy.net> <1123984419.52112@sarvega.com>
User-agent: Mutt/1.4.1i

* Tom Moog <tmoog@sarvega.com> [2005-08-13 21:53]:


> On Aug 13 07:19, Alan Gutierrez <alan-xml-dev@engrm.com> wrote:
> >
> > Subject: Re: [xml-dev] XML Max Character Value
> >
> > * Bob Foster <bob@objfac.com> [2005-08-13 02:55]:
> > 
> > > Alan Gutierrez wrote:
> > 
> > > >     I'm implementing B-Tree to index XML documents. I'd like a
> > > >     to use maximum character value as a boundry, or failing that a
> > > >     minimum character value.
> > 
> > > I believe the current Unicode character range, and the one that was 
> > > effective for the XML 1.0 standard, is 0x20-0x10000 (note 17 bits) plus 
> > > the control characters, '\t' and '\n' and minus the surrogate pair range 
> > > and 0xFFFF and 0xFFFE.

> The maximum for xml is 0x10ffff.

> You may want to think in terms of utf-8 encoding.

> One characteristic of utf-8 is that it preserves the order of
> strings.  In other words, if code(A) < code(B), then utf-8(A)
> utf-8(B) when compared as a sequence of unsigned 8 bit bytes.

    That sounds good. For text data like XSLT dates, '2005-08-10',
    where locale and colation might not matter, I'll want to use the
    simplest, smallest representation possible. Maybe not the best
    example, since there is binary representation.

    In any case...

    I've reworked my algorithm so that it starts from a head node
    that is an implicit least value node. The conditionals only
    apply to subsequent nodes, which are built from inserted values.
    
    Thus, I've removed the need for a sentinal.  I'll only ever be
    testing against characters found within the XML document.

    Thank you everyone who responded, I'm sure I'm going want to ask
    more questions later about collation.

--
Alan Gutierrez - alan@engrm.com
    - http://engrm.com/blogometer/index.html
    - http://engrm.com/blogometer/rss.2.0.xml

References:
- Re: [xml-dev] XML Max Character Value
  - From: Alan Gutierrez <alan-xml-dev@engrm.com>
- Re: [xml-dev] XML Max Character Value
  - From: "Tom Moog" <tmoog@sarvega.com>

Prev by Date: Re: [xml-dev] XML Max Character Value
Next by Date: Re: [xml-dev] XML Max Character Value
Previous by thread: Re: [xml-dev] XML Max Character Value
Next by thread: Re: [xml-dev] XML Max Character Value
Index(es):
- Date
- Thread