[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] The limitations of XPath and navigation for XML database processing
- From: Jim Melton <jim.melton@acm.org>
- To: mike@adatinc.com
- Date: Fri, 08 Feb 2008 15:27:04 -0700
I just have to leap into this discussion, since SQL is being bandied
about ;^) (As readers will either know already or see in my
signature block at the end of this message, I have a very close
relationship with SQL that has lasted more than two decades.)
First, I think that I agree with Mike Kay on one important point.
He said "I believe that XPath 2.0 is relationally complete and that
it can therefore perform any query that SQL can perform."
XPath 2.0, along with XQuery 1.0, was designed to be a complete
expression language and to be computationally complete. I'm not
quite certain what Mike means by "relationally complete", but I
assume that he means that the language can perform operations equivalent
to those defined by the relational model.
Second, I have to disagree with Mike Kay on another very important
point. He went on to say "In fact it can do more, because it
can do some (not all) recursive queries, which are beyond the capability
of SQL without extensions, and which are very important in processing
many kinds of hierarchical data." This statement is simply not
true at all. Standard SQL has had recursive querying capabilities
-- meaning *without extensions* -- since SQL:1999 was published (which
Mike David mentioned). A number of SQL products have that
capability built in (although some use non-standard syntax that
accomplishes the goal). Of course, SQL:2003 also has that feature,
and the soon-to-be-published SQL:2008 will have it as well.
Finally, Mike Kay's statement that "...I haven't written any SQL for
about 25 years" caught my eye. If my arithmetic is correct,
that would be about 1983, long before SQL-86 was published, and about the
time that commercial products really began to appear. Things have
changed a lot since then, Mike ;^) Seriously, I hope that your use
of the word "extensions" doesn't imply that you think that all
SQL features added since SQL-86 are "extensions".
Enjoying the thread,
Jim
At 2/8/2008 01:46 PM, mike@adatinc.com wrote:
My reply to Michael Kay is in
red.
>XPath user navigation for
selecting data going down a path does require coding the data types in
the order they are retrieved.
Sorry that simply isn't true. You can (and products do) implement a path
expression such as A/B[@id='4']/C using a wide variety of different
access strategies. You do not have to "go down the path" from
the root to the leaves; you can start at the leaves and work up, or start
in the middle and work both ways. This expression is not a sequence of
step-by-step instructions, it is a request to find C elements that have a
B parent whose @id is 4 and whose parent is an A: it is a declarative
specification of the characteristics of the elements that you want to
retrieve. Suggesting this is procedural is like suggesting that regular
expressions are procedural - you've confused the specification and your
first instinct at a naive algorithm for implementing it.
I will say I should have used the
word ?encountered? instead ?retrieved? in my statement above. A before B,
B before C. The user needs to know the structure and specify the
navigation in that order. As an external user communicating what I want,
A/B[@ID=?4?]/C, says: starting at the As, goto Bs with an ID of ?4?, and
then select all Cs with the qualified B parents. How it is performed
internally is not the user?s concern as long as the internal processing
semantics match the external operational semantics. In SQL: ?SELECT
C.vals FROM ABCview WHERE B.id=?4?? is a more nonprocedural having
separated structure definition from the data request no longer requires
the user to have knowledge of structure enabling their query to be free
of the structure. This is navigationless processing. This does not
imply that no navigation is occurring internally.
>Multi-leg queries (known as LCA
queries) such as selecting data from one leg of a hierarchical structure
based on data in another leg of the structure was used in the example at
the end of the article. So there was an example demonstrating a multi-leg
query. LCA stands for Lowest Common Ancestor and also goes by a couple of
less used acronyms.
Sorry, I only got to about page 6
before giving up. Page 6 has a "Conclusion", namely "Even
XQuery designed from the ground up to process XML structures requires
procedural navigation keeping its processing limited to linear processing
for the most part." which I think is so fundamentally wrong (and
it's not exactly a pleasure to read either) that I didn't feel it was
worth reading on to the Appendix.
The paragraph this is mentioned in is
talking about the XML industry has yet to advance to full nonlinear
multi-leg hierarchical processing because separate navigations of linear
legs can not be practically joined to preserve and process full
hierarchical LCA queries in XQuery. XQuery also supports the inner join
which flattens hierarchical structures invalidating hierarchical
structures for correct hierarchical output.
The ANSI SQL hierarchical prototype
used in the example can perform full multi-leg (nonlinear) hierarchical
processing by limiting its operation to only defining and operating on
hierarchical structures to produce fully accurate hierarchically
processed multi-leg queries. The standard ANSI SQL engine is performing
this full hierarchical processing. ANSI SQL?s Cartesian product operation
supports full LCA processing when performing hierarchically. The SQL
hierarchical prototype is using a standard ANSI SQL commercially
available processor as its hierarchical processor engine.
I believe that XPath 2.0 is
relationally complete and that it can therefore perform any query that
SQL can perform. In fact it can do more, because it can do some (not all)
recursive queries, which are beyond the capability of SQL without
extensions, and which are very important in processing many kinds of
hierarchical data. For queries of this complexity, however, XQuery is a
much more suitable candidate than XPath. I agree that XPath 1.0 was not
able to perform arbitrary joins - but that's history.
With SQL-92, one sided (hierarchical)
outer joins with separate ON clause joins at each join point, they can
link back on the hierarchical structure being modeled to define multi-leg
structures. SQL?s syntax models the hierarchical structure; SQL?s
semantics processes it hierarchically. I do not beleive XPath 2.0 or
XQuery can match this natural full multi-leg hierarchical data modeling
and processing capability
SQL-99 does support recursive
joins.
The main reason XQuery is more
suitable is that it allows you to write functions, which are the
equivalent of views in SQL (though again they are more powerful because
they can be recursive, which is needed for hierarchical data).
The ANSI SQL hierarchical prototype
uses standard SQL views to define full nonlinear multi-leg hierarchical
structures. The prototype treats them as hierarchical structure meta data
and can adapt processing automatically to each specific query. This
allows it to hierarchically optimize the query request based on the query
selection specification eliminating overhead for global views by
dynamically removing unneeded hierarchical paths from processing.
Hierarchical optimization can only be performed on hierarchically
structured data. This is performed totally before standard SQL
optimization processing is performed. These hierarchical views can be
easily dynamically combined into hierarchical structures also using the
standard one sided outer join. SQL-99 supported recursive structures may
be able to be adapted to full hierarchical recursive structures.
I haven't attempted to code your
example in XQuery because I don't think I have fully understood the query
specification. That's partly because I haven't written any SQL for about
25 years and I find it very hard to remember its arcane syntax and
semantics; also you seem to be using SQL extensions that I haven't come
across before.
With SQL?s arcane syntax, the SQL ON
(join) clause is similar to XPath?s navigation and data
filtering/qualification processing, while the WHERE clause operates
hierarchically across the total nonlinear multi-leg structure being
processed. So there is a big difference between SQL?s ON and WHERE
clauses, especially for hierarchical processing use. Multi-leg queries
require LCA processing as described above to operate automatically in
ANSI SQL. The more hierarchical paths involved in the query, the more
compounded the LCA processing becomes with multiple LCAs that can also be
nested. This is too complex and error prone for XQuery to perform
practically. As some proof of this, google ?XQuery LCA? and you will
locate a number of academic projects that attempted to add LCA processing
on top of XQuery and why they felt the need to.
Regards,
/Mike (David)
========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)
Phone: +1.801.942.0144
Co-Chair, W3C XML Query WG; XQX (etc.) editor
Fax : +1.801.942.3345
Oracle Corporation Oracle
Email: jim dot melton at oracle dot com
1930 Viscounti Drive Standards email: jim
dot melton at acm dot org
Sandy, UT 84093-1063
USA Personal email:
jim at melton dot name
========================================================================
= Facts are facts. But any opinions expressed are the
opinions =
= only of myself and may or may not reflect the opinions of
anybody =
= else with whom I may or may not have discussed the issues at
hand. =
========================================================================
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]