Re: [xml-dev] The limitations of XPath and navigation for XML database

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] The limitations of XPath and navigation for XML database processing

From: Jim Melton <jim.melton@acm.org>
To: mike@adatinc.com
Date: Fri, 08 Feb 2008 15:27:04 -0700

I just have to leap into this discussion, since SQL is being bandied about ;^) (As readers will either know already or see in my signature block at the end of this message, I have a very close relationship with SQL that has lasted more than two decades.)

First, I think that I agree with Mike Kay on one important point. He said "I believe that XPath 2.0 is relationally complete and that it can therefore perform any query that SQL can perform." XPath 2.0, along with XQuery 1.0, was designed to be a complete expression language and to be computationally complete. I'm not quite certain what Mike means by "relationally complete", but I assume that he means that the language can perform operations equivalent to those defined by the relational model.

Second, I have to disagree with Mike Kay on another very important point. He went on to say "In fact it can do more, because it can do some (not all) recursive queries, which are beyond the capability of SQL without extensions, and which are very important in processing many kinds of hierarchical data." This statement is simply not true at all. Standard SQL has had recursive querying capabilities -- meaning *without extensions* -- since SQL:1999 was published (which Mike David mentioned). A number of SQL products have that capability built in (although some use non-standard syntax that accomplishes the goal). Of course, SQL:2003 also has that feature, and the soon-to-be-published SQL:2008 will have it as well.

Finally, Mike Kay's statement that "...I haven't written any SQL for about 25 years" caught my eye. If my arithmetic is correct, that would be about 1983, long before SQL-86 was published, and about the time that commercial products really began to appear. Things have changed a lot since then, Mike ;^) Seriously, I hope that your use of the word "extensions" doesn't imply that you think that all SQL features added since SQL-86 are "extensions".

Enjoying the thread,
Jim

At 2/8/2008 01:46 PM, mike@adatinc.com wrote:

My reply to Michael Kay is in red.

>XPath user navigation for selecting data going down a path does require coding the data types in the order they are retrieved.

Sorry that simply isn't true. You can (and products do) implement a path expression such as A/B[@id='4']/C using a wide variety of different access strategies. You do not have to "go down the path" from the root to the leaves; you can start at the leaves and work up, or start in the middle and work both ways. This expression is not a sequence of step-by-step instructions, it is a request to find C elements that have a B parent whose @id is 4 and whose parent is an A: it is a declarative specification of the characteristics of the elements that you want to retrieve. Suggesting this is procedural is like suggesting that regular expressions are procedural - you've confused the specification and your first instinct at a naive algorithm for implementing it.

I will say I should have used the word ?encountered? instead ?retrieved? in my statement above. A before B, B before C. The user needs to know the structure and specify the navigation in that order. As an external user communicating what I want, A/B[@ID=?4?]/C, says: starting at the As, goto Bs with an ID of ?4?, and then select all Cs with the qualified B parents. How it is performed internally is not the user?s concern as long as the internal processing semantics match the external operational semantics. In SQL: ?SELECT C.vals FROM ABCview WHERE B.id=?4?? is a more nonprocedural having separated structure definition from the data request no longer requires the user to have knowledge of structure enabling their query to be free of the structure. This is navigationless processing. This does not imply that no navigation is occurring internally.

>Multi-leg queries (known as LCA queries) such as selecting data from one leg of a hierarchical structure based on data in another leg of the structure was used in the example at the end of the article. So there was an example demonstrating a multi-leg query. LCA stands for Lowest Common Ancestor and also goes by a couple of less used acronyms.

Sorry, I only got to about page 6 before giving up. Page 6 has a "Conclusion", namely "Even XQuery designed from the ground up to process XML structures requires procedural navigation keeping its processing limited to linear processing for the most part." which I think is so fundamentally wrong (and it's not exactly a pleasure to read either) that I didn't feel it was worth reading on to the Appendix.

The paragraph this is mentioned in is talking about the XML industry has yet to advance to full nonlinear multi-leg hierarchical processing because separate navigations of linear legs can not be practically joined to preserve and process full hierarchical LCA queries in XQuery. XQuery also supports the inner join which flattens hierarchical structures invalidating hierarchical structures for correct hierarchical output.

The ANSI SQL hierarchical prototype used in the example can perform full multi-leg (nonlinear) hierarchical processing by limiting its operation to only defining and operating on hierarchical structures to produce fully accurate hierarchically processed multi-leg queries. The standard ANSI SQL engine is performing this full hierarchical processing. ANSI SQL?s Cartesian product operation supports full LCA processing when performing hierarchically. The SQL hierarchical prototype is using a standard ANSI SQL commercially available processor as its hierarchical processor engine.

I believe that XPath 2.0 is relationally complete and that it can therefore perform any query that SQL can perform. In fact it can do more, because it can do some (not all) recursive queries, which are beyond the capability of SQL without extensions, and which are very important in processing many kinds of hierarchical data. For queries of this complexity, however, XQuery is a much more suitable candidate than XPath. I agree that XPath 1.0 was not able to perform arbitrary joins - but that's history.

With SQL-92, one sided (hierarchical) outer joins with separate ON clause joins at each join point, they can link back on the hierarchical structure being modeled to define multi-leg structures. SQL?s syntax models the hierarchical structure; SQL?s semantics processes it hierarchically. I do not beleive XPath 2.0 or XQuery can match this natural full multi-leg hierarchical data modeling and processing capability

SQL-99 does support recursive joins.

The main reason XQuery is more suitable is that it allows you to write functions, which are the equivalent of views in SQL (though again they are more powerful because they can be recursive, which is needed for hierarchical data).

The ANSI SQL hierarchical prototype uses standard SQL views to define full nonlinear multi-leg hierarchical structures. The prototype treats them as hierarchical structure meta data and can adapt processing automatically to each specific query. This allows it to hierarchically optimize the query request based on the query selection specification eliminating overhead for global views by dynamically removing unneeded hierarchical paths from processing. Hierarchical optimization can only be performed on hierarchically structured data. This is performed totally before standard SQL optimization processing is performed. These hierarchical views can be easily dynamically combined into hierarchical structures also using the standard one sided outer join. SQL-99 supported recursive structures may be able to be adapted to full hierarchical recursive structures.

I haven't attempted to code your example in XQuery because I don't think I have fully understood the query specification. That's partly because I haven't written any SQL for about 25 years and I find it very hard to remember its arcane syntax and semantics; also you seem to be using SQL extensions that I haven't come across before.

With SQL?s arcane syntax, the SQL ON (join) clause is similar to XPath?s navigation and data filtering/qualification processing, while the WHERE clause operates hierarchically across the total nonlinear multi-leg structure being processed. So there is a big difference between SQL?s ON and WHERE clauses, especially for hierarchical processing use. Multi-leg queries require LCA processing as described above to operate automatically in ANSI SQL. The more hierarchical paths involved in the query, the more compounded the LCA processing becomes with multiple LCAs that can also be nested. This is too complex and error prone for XQuery to perform practically. As some proof of this, google ?XQuery LCA? and you will locate a number of academic projects that attempted to add LCA processing on top of XQuery and why they felt the need to.

Regards,

/Mike (David)

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
Co-Chair, W3C XML Query WG; XQX (etc.) editor    Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
========================================================================
= Facts are facts.   But any opinions expressed are the opinions      =
= only of myself and may or may not reflect the opinions of anybody   =
= else with whom I may or may not have discussed the issues at hand. =
========================================================================

Follow-Ups:
- RE: [xml-dev] The limitations of XPath and navigation for XML database processing
  - From: "Michael Kay" <mike@saxonica.com>

References:
- Re: [xml-dev] The limitations of XPath and navigation for XML database processing
  - From: mike@adatinc.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]