[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] The limitations of XPath and navigation for XML database processing
- From: "Michael Kay" <mike@saxonica.com>
- To: "'Jim Melton'" <jim.melton@acm.org>,<mike@adatinc.com>
- Date: Fri, 8 Feb 2008 23:58:00 -0000
Yes, sorry, I was using the the term "SQL" as a
synonym for relational, which is of course incorrect. I was thinking
of "relational" in the sense of Ted Codd, and in that sense a lot of things that
have been added over the years (like recursion) are indeed extensions to the
relational model. And I was using "relationally complete" in the sense of Ted
Codd's original papers - first order predicate calculus, no recursion. Sorry
that my carelessness caused confusion.
Michael Kay
I just have to leap into this discussion, since SQL is being
bandied about ;^) (As readers will either know already or see in my
signature block at the end of this message, I have a very close relationship
with SQL that has lasted more than two decades.)
First, I think that I
agree with Mike Kay on one important point. He said "I believe that
XPath 2.0 is relationally complete and that it can therefore perform any query
that SQL can perform." XPath 2.0, along with XQuery 1.0, was designed to
be a complete expression language and to be computationally complete.
I'm not quite certain what Mike means by "relationally complete", but I assume
that he means that the language can perform operations equivalent to those
defined by the relational model.
Second, I have to disagree with Mike
Kay on another very important point. He went on to say "In fact it can
do more, because it can do some (not all) recursive queries, which are beyond
the capability of SQL without extensions, and which are very important in
processing many kinds of hierarchical data." This statement is simply
not true at all. Standard SQL has had recursive querying capabilities --
meaning *without extensions* -- since SQL:1999 was published (which Mike David
mentioned). A number of SQL products have that capability built in
(although some use non-standard syntax that accomplishes the goal). Of
course, SQL:2003 also has that feature, and the soon-to-be-published SQL:2008
will have it as well.
Finally, Mike Kay's statement that "...I haven't
written any SQL for about 25 years" caught my eye. If my arithmetic is
correct, that would be about 1983, long before SQL-86 was published, and about
the time that commercial products really began to appear. Things have
changed a lot since then, Mike ;^) Seriously, I hope that your use of
the word "extensions" doesn't imply that you think that all SQL features added
since SQL-86 are "extensions".
Enjoying the thread,
Jim
At 2/8/2008 01:46 PM, mike@adatinc.com wrote:
My reply to Michael Kay is in
red.
>XPath user navigation for selecting data
going down a path does require coding the data types in the order they are
retrieved.
Sorry that simply isn't true. You
can (and products do) implement a path expression such as A/B[@id='4']/C
using a wide variety of different access strategies. You do not have to "go
down the path" from the root to the leaves; you can start at the leaves and
work up, or start in the middle and work both ways. This expression is not a
sequence of step-by-step instructions, it is a request to find C elements
that have a B parent whose @id is 4 and whose parent is an A: it is a
declarative specification of the characteristics of the elements that you
want to retrieve. Suggesting this is procedural is like suggesting that
regular expressions are procedural - you've confused the specification and
your first instinct at a naive algorithm for implementing
it.
I will say I
should have used the word ?encountered? instead ?retrieved? in my statement
above. A before B, B before C. The user needs to know the structure and
specify the navigation in that order. As an external user communicating what
I want, A/B[@ID=?4?]/C, says: starting at the As, goto Bs with an ID of ?4?,
and then select all Cs with the qualified B parents. How it is performed
internally is not the user?s concern as long as the internal processing
semantics match the external operational semantics. In SQL: ?SELECT C.vals
FROM ABCview WHERE B.id=?4?? is a more nonprocedural having separated
structure definition from the data request no longer requires the user to
have knowledge of structure enabling their query to be free of the
structure. This is navigationless processing. This does not imply that no
navigation is occurring internally.
>Multi-leg queries (known as LCA queries)
such as selecting data from one leg of a hierarchical structure based on
data in another leg of the structure was used in the example at the end of
the article. So there was an example demonstrating a multi-leg query. LCA
stands for Lowest Common Ancestor and also goes by a couple of less used
acronyms.
Sorry, I only got to about page 6 before
giving up. Page 6 has a "Conclusion", namely "Even XQuery designed from the
ground up to process XML structures requires procedural navigation keeping
its processing limited to linear processing for the most part." which I
think is so fundamentally wrong (and it's not exactly a pleasure to read
either) that I didn't feel it was worth reading on to the
Appendix.
The paragraph this is mentioned in is talking
about the XML industry has yet to advance to full nonlinear multi-leg
hierarchical processing because separate navigations of linear legs can not
be practically joined to preserve and process full hierarchical LCA queries
in XQuery. XQuery also supports the inner join which flattens hierarchical
structures invalidating hierarchical structures for correct hierarchical
output.
The
ANSI SQL hierarchical prototype used in the example can perform full
multi-leg (nonlinear) hierarchical processing by limiting its operation to
only defining and operating on hierarchical structures to produce fully
accurate hierarchically processed multi-leg queries. The standard ANSI SQL
engine is performing this full hierarchical processing. ANSI SQL?s Cartesian
product operation supports full LCA processing when performing
hierarchically. The SQL hierarchical prototype is using a standard ANSI SQL
commercially available processor as its hierarchical processor engine.
I believe
that XPath 2.0 is relationally complete and that it can therefore perform
any query that SQL can perform. In fact it can do more, because it can do
some (not all) recursive queries, which are beyond the capability of SQL
without extensions, and which are very important in processing many kinds of
hierarchical data. For queries of this complexity, however, XQuery is a much
more suitable candidate than XPath. I agree that XPath 1.0 was not able to
perform arbitrary joins - but that's history.
With
SQL-92, one sided (hierarchical) outer joins with separate ON clause joins
at each join point, they can link back on the hierarchical structure being
modeled to define multi-leg structures. SQL?s syntax models the hierarchical
structure; SQL?s semantics processes it hierarchically. I do not beleive
XPath 2.0 or XQuery can match this natural full multi-leg hierarchical data
modeling and processing capability
SQL-99 does support recursive
joins.
The
main reason XQuery is more suitable is that it allows you to write
functions, which are the equivalent of views in SQL (though again they are
more powerful because they can be recursive, which is needed for
hierarchical data).
The ANSI SQL hierarchical prototype uses
standard SQL views to define full nonlinear multi-leg hierarchical
structures. The prototype treats them as hierarchical structure meta data
and can adapt processing automatically to each specific query. This allows
it to hierarchically optimize the query request based on the query selection
specification eliminating overhead for global views by dynamically removing
unneeded hierarchical paths from processing. Hierarchical optimization can
only be performed on hierarchically structured data. This is performed
totally before standard SQL optimization processing is performed. These
hierarchical views can be easily dynamically combined into hierarchical
structures also using the standard one sided outer join. SQL-99 supported
recursive structures may be able to be adapted to full hierarchical
recursive structures.
I haven't attempted to code your example in
XQuery because I don't think I have fully understood the query
specification. That's partly because I haven't written any SQL for about 25
years and I find it very hard to remember its arcane syntax and semantics;
also you seem to be using SQL extensions that I haven't come across
before.
With
SQL?s arcane syntax, the SQL ON (join) clause is similar to XPath?s
navigation and data filtering/qualification processing, while the WHERE
clause operates hierarchically across the total nonlinear multi-leg
structure being processed. So there is a big difference between SQL?s ON and
WHERE clauses, especially for hierarchical processing use. Multi-leg queries
require LCA processing as described above to operate automatically in ANSI
SQL. The more hierarchical paths involved in the query, the more compounded
the LCA processing becomes with multiple LCAs that can also be nested. This
is too complex and error prone for XQuery to perform practically. As some
proof of this, google ?XQuery LCA? and you will locate a number of academic
projects that attempted to add LCA processing on top of XQuery and why they
felt the need to.
Regards,
/Mike (David)
========================================================================
Jim
Melton --- Editor of ISO/IEC 9075-* (SQL) Phone:
+1.801.942.0144
Co-Chair, W3C XML Query WG; XQX (etc.)
editor Fax : +1.801.942.3345
Oracle
Corporation Oracle Email: jim dot
melton at oracle dot com
1930 Viscounti Drive
Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063
USA Personal email: jim
at melton dot
name
========================================================================
=
Facts are facts. But any opinions expressed are the
opinions =
= only of myself and may or
may not reflect the opinions of anybody =
= else with
whom I may or may not have discussed the issues at hand.
=
========================================================================
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]