Lists Home |
Date Index |
> I think that optimization of // is a more compelling way to
> use knowledge
> of complex types. Suppose you have a pattern like this:
> Without knowledge of the complex types involved, this
> requires examination
> of all elements in the document to see if they are "address"
> Looking at the schema for a particular invoice document, it
> is easy to see
> that the above pattern can only match shipping or billing
> addresses found
> in customers. The optimizer can rewrite the above pattern as follows:
> /customer/billing/address | /customer/shipping/address
> In at least some environments, this will be much more
> efficient to execute.
Yes, //address makes a very good case study. But the answer isn't clear cut.
The above rewrite is one way of optimizing it. Another way is to use an
index. Here there is a real difference between XSLT and XQuery: with XQuery,
the documents are typically built and indexed long before the query is
written, so as with a relational database, indexing decisions are made by
database designers based on guesswork about the future query workload. With
XSLT, the stylesheet is usually compiled before the source document(s) are
parsed and turned into trees, so the stylesheet can direct that indexes
should be built to support the access paths it wants to be fast. One idea I
have played with in Saxon is to build a mini-schema for the document as it
is being parsed (essentially an A-contains-B graph) - this could potentially
be more useful than the actual schema since it describes what is actually
present in the document, rather than what is permitted to be present.
What Saxon actually does with //address is to compute it from a full
document scan the first time it is used on a particular document, then to
save the results for subsequent occasions: a sort of "just-in-time index". I
think it's high time dynamic indexing ideas for persistent databases were
revisited, most of the work I've seen dates from the 1970s.
All this goes to prove that there is scope for plenty of PhD theses on
XPath/XQuery optimization. I think it's obvious that there are optimizations
that can be done with knowledge of the schema that can't be done without,
but exactly what those optimizations are is still a matter for research.