RE: XQuery -- Reinventing the Wheel?

From: johns@syscore.com (John F. Schlesinger)

To: Jonathan.Robie@SoftwareAG-USA.com, elenz@xyzfind.com, xml-dev@lists.xml.org

Date: Thu, 22 Feb 2001 13:24:32 -0500

Title: RE: XQuery -- Reinventing the Wheel?

Jonathan wrote:

"A query language needs to be optimizable for queries. To make this possible, we need to be able to discover equivalences so that queries can be rewritten flexibly based on the performance parameters of various kinds of access. Both the XQuery language and the XML Query Algebra are designed to make this possible."

Without trying to get to much into the debate, I would think that if there is a need to re-write queries the most obvious way to specify the re-write would be to use XSL. This is what it is for, after all. However, to do this the query needs to be XML. Therefore, the need for query re-write is, for me, a strong argument that the query syntax should be XML. This, in turn, suggests that FLWR is inferior for this purpose than XSLT.

<nostalgia>When we were trying to implement query re-write in EDA/SQL we would have given our right arms for a query syntax in XML with a transform language like XSLT.</nostalgia>

Yours,

John Schlesinger

SysCore Solutions

-----Original Message-----
From: Jonathan.Robie@SoftwareAG-USA.com [mailto:Jonathan.Robie@SoftwareAG-USA.com]
Sent: Thursday, February 22, 2001 11:18 AM
To: elenz@xyzfind.com; xml-dev@lists.xml.org
Subject: RE: XQuery -- Reinventing the Wheel?

Evan Lenz wrote:

> Reinventing the Wheel

I think I should start by responding to the title of this message. The phrase "reinventing the wheel" usually refers to reinventing something that already exists because you don't know about it. The editors of XQuery include a former member of the XSL Working Group who has written a fair number of stylesheets. They also include one of the inventors of SQL, one of the inventors of XML-QL, and one of the inventors of XQL, a precursor of XPath. We considered quite a few syntax approaches, including building on XSLT, before arriving at the approach we used.

Also, you imply that we are off on a completely different track than XSLT. In fact, we are working closely together with the XSL Working Group to define XPath 2.0. This includes not only adding features, but deriving a new model for XPath that is able to account for XML Schema types.

> After reviewing the XQuery spec, I'm concluding that the
> overlap between XQuery and XSLT is far too great for the
> W3C to reasonably recommend them both as separate languages.

XQuery and XSLT will share a common expression language, including path expressions. XSLT is really two languages, an XML-based language used to write the templates, and XPath, an expression language used for patterns. Both XQuery and XSLT will use XPath 2.0, and the two Working Groups are working closely together on this. So the two languages will share a great deal.

Why have a new language? Three reasons: (1) ease of use for our use cases, (2) optimizability, (3) strong data typing.

1. Ease of use

XQuery is significantly more straightforward for a lot of common database queries. To some extent, what is straightforward is a matter of taste, a realm where logic does not reach, but I think that some of the reasons are worth stating.

First, simple queries are simpler in XQuery. For instance, an XPath 2.0 expression that uses the abbreviated syntax is also a valid query by itself. This is not true of XPath. Your document http://www.xmlportfolio.com/xquery.html incorrectly labels XPath expressions as XSLT, but an XSLT processor will not process your examples unless you place them in a template. Consider a simple query that looks for all employees in a set of documents:

//emp

This is much easier to read and write than the equivalent XSLT stylesheet:

</xsl:stylesheet>

This difference is also present for some moderately complex queries. When you consider the following XQuery expression:

/emp[rating = "Poor"]/@mgr->emp/@mgr->emp/name

you compare it to the following XSLT fragment:

<xsl:variable name="poorEmpManagers" select="id(/emp[rating = 'Poor']/@mgr)[self::emp]"/>

I think it would be a fairer comparison if you typed in the entire stylesheet that you would have to write in XSLT. I didn't test this, but I think the following is approximately what you would have to write:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

        <xsl:template match="/">
             <xsl:variable name="poorEmpManagers" select="id(/emp[rating = 'Poor']/@mgr)[self::emp]"/>
          <xsl:copy-of select="id($poorEmpManagers/@mgr)[self::emp]/name"/>
        </xsl:template>

</xsl:stylesheet>

The fact that *any* expression in XQuery is a valid query makes it easier to write simple queries, without the overhead associated with a stylesheet. For what it's worth, here's the shortest XQuery expression that can be executed as a stand-alone query:

Also, the keyword-oriented approach of XQuery is more familiar and comfortable to many programmers. I would rather write:

FOR $b IN document("bib.xml")//book
WHERE $b/publisher = "Morgan Kaufmann"
AND $b/year = "1998"
RETURN $b/title

than

<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
    <xsl:for-each select="document('bib.xml')//book">
      <xsl:if test="publisher='Morgan Kaufmann' and year='1998'">
        <xsl:copy-of select="title"/>
      </xsl:if>
    </xsl:for-each>
</xsl:template>
</xsl:transform>

Note that there's been no great rush to create an XML syntax for Java, JavaScript, Visual Basic, or other high level programming languages. Several people have attempted to make XML syntaxes for SQL, but I have not been impressed by the results.

2. Conventional Database Functionality

XQuery is more suitable to many of the kinds of queries that SQL programmers are used to. Joins and the distinct() function account for a lot of this - no surprise, since XQuery's FLWR expressions are quite similar to SQL's SELECT/FROM/WHERE. It may make sense, incidentally, to add these to XSLT as well. Another reason for XQuery and XSLT to continue to work together on XPath 2.0.

To a database person, it is somewhat surprising that your paper does not explicitly mention joins, which are one of the biggest reasons for FLWR expressions in XQuery. Joins are central to database functionality, and it is important to express them in a way that allows optimization based on patterns detected in the expressions. I also notice that the examples in your paper do not include any examples from Section 3 of the XQuery paper, which shows how conventional SQL-like queries are done.

In your paper, you point out that FLWR expressions do have some syntactic similarity to XSLT's <xsl:foreach />. This is true, but it misses the purpose of FLWR expressions, which is to provide general SQL-like functionality for joins and declarative restructuring. A naive mapping of FLWR expressions to <xsl:foreach /> is not likely to give you an efficient implementation of joins.

You do give an example that combines a join with distinct(). The XQuery looks like this:

FOR $p IN distinct(document("bib.xml")//publisher)
LET $a := avg(document("bib.xml")
   /book[publisher = $p]/price)
RETURN
   <publisher>
      <name> $p/text() </name> ,
      <avgprice> $a </avgprice>
   </publisher>

The equivalent XSLT looks like this:

<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
    <xsl:for-each select="document('bib.xml')//publisher[not(.=preceding::publisher)]">
      <xsl:variable name="prices"
                    select="document('bib.xml')/book[publisher=current()]/price"/>
      <xsl:variable name="avgPrice" select="sum($prices) div count($prices)"/>
      <publisher>
        <name><xsl:value-of select="."/></name>
        <avgprice><xsl:value-of select="$avgPrice"/></avgprice>
      </publisher>
    </xsl:for-each>
</xsl:template>
</xsl:transform>

Again, I find the XQuery solution much easier to read and write. This is the kind of thing XQuery was designed for. More important, in XQuery, we have been thinking of database optimization, and I think we will be able to figure out how to optimize the XQuery equivalent better.

2. Optimizability

A query language needs to be optimizable for queries. To make this possible, we need to be able to discover equivalences so that queries can be rewritten flexibly based on the performance parameters of various kinds of access. Both the XQuery language and the XML Query Algebra are designed to make this possible.

3. Strong Typing

XQuery will be a strongly typed language. This typing will extend to content models - a function whose return type is "paragraph element" will return a valid paragraph element. This level of strong typing is very helpful in industrial strength programming environments, and difficult to achieve with the current XSLT. Much of the effort, and much of the justification for the Query Algebra is achieving strong typing.

In fact, XSLT may benefit from this work. It would be helpful to have stronger typing in XSLT as well. For instance, I would like to be able to check whether a given stylesheet will always produce valid HTML 4.0 for a given DTD. Several people are investigating this - it is much to early to say whether it can be achieved.

At any rate, I hope this helps explain why I think XQuery is worth developing as a language, in addition to XSLT.

Jonathan

References:

RE: XQuery -- Reinventing the Wheel?
- From: Jonathan.Robie@SoftwareAG-USA.com