[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
XQuery: is FLWR a <xsl:foreach/> ?
- From: Jonathan Robie <Jonathan.Robie@SoftwareAG-USA.com>
- To: Evan Lenz <elenz@xyzfind.com>, Jonathan.Robie@SoftwareAG-USA.com,xml-dev@lists.xml.org
- Date: Sat, 24 Feb 2001 11:29:22 -0500
At 06:38 AM 2/23/2001 -0800, Evan Lenz wrote:
>Jonathan Robie wrote:
>
> > To a database person, it is somewhat surprising that your
> > paper does not explicitly mention joins, which are one of
> > the biggest reasons for FLWR expressions in XQuery. Joins
> > are central to database functionality, and it is important
> > to express them in a way that allows optimization based on
> > patterns detected in the expressions. I also notice that the
> > examples in your paper do not include any examples from
> > Section 3 of the XQuery paper, which shows how conventional
> > SQL-like queries are done.
>
>That's because Section 3 does not introduce any new query functionality.
>Using joins over an XML view of a relational database is just another use
>of the FLWR expression. The XSLT mappings to these are just as determinate
>as the rest.
I think this is a central difference between our views. One of the reasons
for FLWR expressions is that there is an extensive literature on optimizing
these kinds of expressions in SQL, OQL, and various tree-structured
languages that are related to these first two languages. Although it may be
possible to perform similar optimizations on XSLT, this clearly falls in
the "future research" category. A fundamentally important issue: how does a
query optimizer recognize patterns in the query, correlate them with
information about the schema and the data, and rewrite the query in ways
that can be proven to be equivalent and perform much faster?
For instance, suppose I have the following XQuery:
FOR $i IN //invoice,
$p IN distinct($i//product)
WHERE $i/customer = "ACME",
$p/name = "screwdriver"
RETURN
<product_ordered>
$p, $i/date
</product_ordered>
The query optimizer should be able to see that the WHERE conditions can be
lifted up into the XPath:
FOR $i IN //invoice[customer="ACME"][.//product/name="screwdriver"],
$p IN distinct($i//product)
RETURN
<product_ordered>
$p, $i/date
</product_ordered>
Now the query optimizer can look to see whether a datastore has an index on
customer or on product name. Perhaps the indexes also have the quantities
of the items. If there are tens of thousands of invoices for ACME, but only
one invoice for a screwdriver, then it will act differently than it would
if there were tends of thousands of invoices for screwdrivers, but only one
for ACME.
At any rate, my own knowledge of query optimization is not deep, so I don't
want to play the expert here. Guido Moerkotte has written an excellent
survey on query optimization techniques which you can access here:
http://pi3.informatik.uni-mannheim.de/staff/mitarbeiter/moer/querycompiler.ps
If you want us to use XSLT syntax directly in favor of our FLWR
expressions, I need to know the answers to questions like these:
1. What are the equivalences that can be exploited for query optimization?
2. What are the typing rules for the possible <xsl:foreach/> constructs?
3. How are the various possible <xsl:foreach/> constructs translated into
SQL? (fill in your favorite environment in place of SQL)
Do you know of any good work in these areas? Please don't ask me to do it
myself, or ask for proof that it can not be done. If we want a solution in
a reasonable amount of time, we should build on work that exists.
Jonathan
There are also aspects of XQuery optimization that fall solidly into the
"future research" category.
These are my opinions right now. They may be quite different from the
opinions of Software AG, the W3C XML Query Working Group, or the opinions
that I will have after reading and considering your response.