OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] What should TrAX look like? (Was: Re: [xml-dev] Article on

[ Lists Home | Date Index | Thread Index ]

On Feb 22, 2005, at 2:05 AM, Elliotte Harold wrote:

> Some engines with some queries show two or three orders of magnitude 
> improvement by using their internal models instead of directly 
> querying DOMs or XML files.

This has less to do with the structure of an iternal model, and more to 
do with more or less clever optimization/indexing independent of the 
object model.

>
> Of course some queries are so fast anyway that improving by two orders 
> of magnitude means you go from 10ms to a tenth of a millisecond,

I remember running a very simple "//item" with the XOM CVS XPath toy. 
Turns out it would have taken a month to run to completion - quadratic 
time complexity where one would expect O(N). With another engine it ran 
in less than a second. Similar for other queries.

> and if you only need to do that once, who cares? But sometimes the 
> speed gain does matter.

I should hope so, most people are not prepared to wait for a month.

>
> Queries can easily be O(N^2) or worse, depending on what they're 
> asking.

In practise queries are computationally simple more often than not 
(modulo OLAP). Further, most queries that would be expensive with a 
naive engine are rather cheap in an RDBMS: O(1), O(log N), O(N) or O(N 
log N), joins with O(N * min(N,M)). This is due to the use of 
appropriate indexes, rewriting, materialized views, etc. For the most 
part XQuery isn't any different than SQL wrt. query optimization, even 
though general purpose indexing is way harder. So long term the same 
pattern is to be expected for XQuery. Given that most queries are 
simple, large constant factors in O(N) still matter, of course, 
including for parsing, serialization and model conversions. When was 
the last time you've seen apps with significant throughput demands 
happily say XML parsing speed wouldn't matter because it's "just" O(N)?

> By contrast a conversion to a native model should normally be able to 
> be done in O(N), where is N is the size of the document. If the 
> conversion includes building an index that reduces ultimate query time 
> to O(NlogN) or less, conversion may be worth it for large documents 
> and complicated queries.

That makes little difference; pretty much the same indexes can be build 
whether or not one is using model conversion.

Wolfgang.





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS