Lists Home |
Date Index |
On Feb 22, 2005, at 2:05 AM, Elliotte Harold wrote:
> Some engines with some queries show two or three orders of magnitude
> improvement by using their internal models instead of directly
> querying DOMs or XML files.
This has less to do with the structure of an iternal model, and more to
do with more or less clever optimization/indexing independent of the
> Of course some queries are so fast anyway that improving by two orders
> of magnitude means you go from 10ms to a tenth of a millisecond,
I remember running a very simple "//item" with the XOM CVS XPath toy.
Turns out it would have taken a month to run to completion - quadratic
time complexity where one would expect O(N). With another engine it ran
in less than a second. Similar for other queries.
> and if you only need to do that once, who cares? But sometimes the
> speed gain does matter.
I should hope so, most people are not prepared to wait for a month.
> Queries can easily be O(N^2) or worse, depending on what they're
In practise queries are computationally simple more often than not
(modulo OLAP). Further, most queries that would be expensive with a
naive engine are rather cheap in an RDBMS: O(1), O(log N), O(N) or O(N
log N), joins with O(N * min(N,M)). This is due to the use of
appropriate indexes, rewriting, materialized views, etc. For the most
part XQuery isn't any different than SQL wrt. query optimization, even
though general purpose indexing is way harder. So long term the same
pattern is to be expected for XQuery. Given that most queries are
simple, large constant factors in O(N) still matter, of course,
including for parsing, serialization and model conversions. When was
the last time you've seen apps with significant throughput demands
happily say XML parsing speed wouldn't matter because it's "just" O(N)?
> By contrast a conversion to a native model should normally be able to
> be done in O(N), where is N is the size of the document. If the
> conversion includes building an index that reduces ultimate query time
> to O(NlogN) or less, conversion may be worth it for large documents
> and complicated queries.
That makes little difference; pretty much the same indexes can be build
whether or not one is using model conversion.