xml-dev - Re: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Articleon

Re: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Articleon

[ Lists Home | Date Index | Thread Index ]

To: kjouk@yahoo.co.uk
Subject: Re: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Articleon JAXP 1.3 "Fast and Easy XML Processing")
From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Tue, 22 Feb 2005 05:05:06 -0500
Cc: xml-dev@lists.xml.org
In-reply-to: <200502220742.53890.kjouk@yahoo.co.uk>
References: <E1D2Si4-0001QA-00@ukmail1.eechost.net> <200502211623.51418.kjones@sarvega.com> <421A0ECF.3090802@metalab.unc.edu> <200502220742.53890.kjouk@yahoo.co.uk>
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.5) Gecko/20041217

Kevin Jones wrote:

> It's probably not a bottleneck but could be a factor, 
> sometimes large. Conversions themselves are just overhead 
> that should be avoided, if at all possible, where I come 
> from.

That itself sounds like folklore. The fact is XQuery can be a very 
computationally expensive operation. A large reason why this language 
was invented was to enable native XML databases that can create their 
own indexable, optimizable data structures that are more amenable to 
querying than traditional models like DOM and XOM. Some engines with 
some queries show two or three orders of magnitude improvement by using 
their internal models instead of directly querying DOMs or XML files.

Of course some queries are so fast anyway that improving by two orders 
of magnitude means you go from 10ms to a tenth of a millisecond, and if 
you only need to do that once, who cares? But sometimes the speed gain 
does matter.

Queries can easily be O(N^2) or worse, depending on what they're asking. 
By contrast a conversion to a native model should normally be able to be 
done in O(N), where is N is the size of the document. If the conversion 
includes building an index that reduces ultimate query time to O(NlogN) 
or less, conversion may be worth it for large documents and complicated 
queries. The cost you worry about is normally the size of the additional 
model you have to build. Operating directly on the original data 
structure, whether through its own native interface or an LCD interface, 
saves the memory costs of building an additional object structure to 
hold the same content. Whether speed or size matters more varies from 
one problem and environment to the next.

Using an LCD interface doesn't really affect this calculation one way or 
the other. The engine can always choose whether to create a more 
efficient internal model tailored to its needs or operate directly on 
the native data structure depending on whether it cares more about size 
or speed (and on how efficient its internal representation is). It can 
make this choice both with native and LCD interfaces. The only 
difference is whether there have to be N*M converters (where N is the 
number of engines and M the number of models) or only M LCD interfaces.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Follow-Ups:
- Re: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Article on JAXP 1.3 "Fast and Easy XML Processing")
  - From: Wolfgang Hoschek <whoschek@lbl.gov>

References:
- RE: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Article on JAXP 1.3 "Fast and Easy XML Processing")
  - From: "Michael Kay" <mike@saxonica.com>
- Re: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Article on JAXP 1.3 "Fast and Easy XML Processing")
  - From: Kevin Jones <kjones@sarvega.com>
- Re: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Articleon JAXP 1.3 "Fast and Easy XML Processing")
  - From: Elliotte Harold <elharo@metalab.unc.edu>

Prev by Date: Why you should avoid Notation Declarations (by Kohsuke Kawaguchi)
Next by Date: schema keyref to other schema.
Previous by thread: Re: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Article on JAXP 1.3 "Fast and Easy XML Processing")
Next by thread: Re: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Article on JAXP 1.3 "Fast and Easy XML Processing")
Index(es):
- Date
- Thread