[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] The limitations of XPath and navigation for XML database processing
- From: noah_mendelsohn@us.ibm.com
- To: mike@adatinc.com
- Date: Thu, 7 Feb 2008 20:33:11 -0500
Michael David writes:
> Your example of XPath /A/B/C being able to be processed
> internally in any way that is still semantically correct to
> optimize it could be applied to COBOL and I would not consider
> COBOL nonprocedural.
We certainly agree that COBOL is imperative and not declarative, but I
don't at all agree that COBOL and XPath are similar in this respect. It's
been just a bit over 30 years since I wrote any COBOL, but from my vague
recollection the following might be pretty close to valid COBOL:
DIVIDE INCOME BY UNITS-SOLD GIVING INCOME-PER-UNIT.
ADD INCOME-PER-UNIT TO PROFIT-PER-UNIT GIVING NET-PROFIT.
DISPLAY NET-PROFIT.
(for those few youngsters on this list who's CS education didn't include
COBOL: yes, the programs really read like this)
Crucially, the COBOL steps have to be done as if in order. If the first
step results in a divide by zero, then you don't perform the second step.
It's an imperative language -- do this step then this one. It's certainly
true that you can do a lot of optimizations under the cover. For
example, if you had a hardware system with some parallelism you could
start fetching PROFIT-PER-UNIT from memory even before trying the divide,
but you certainly couldn't do anything that would make it appear that the
second or third steps had been performed if the first one faulted.
The XPath /A/B/C is very different. It doesn't in any way say "If you
have trouble while finding an A, then you can't have started looking for a
B or a C". It just says: "All Cs with parents that are Bs that in turn
have A parents that are immediate children of the root." No sense of
"first this step then that step".
It's a very deep difference. Nobody has ever said you can't do lots of
optimizations on procedural languages. The first FORTAN compiler for the
IBM 704 [1] was a tour-de-force of optimization for what is clearly an
imperative, procedural language. As MK has said quite eloquently,
though, declarative languages like SQL and XPath much more purely separate
the specification of what is desired from any notion of sequentiality in
making that specification. COBOL clearly says which steps to do first;
XPath doesn't have steps in time sequence, just a specification of the
desired result.
Here's another way to look at it. Let's say I have this XML instance:
<A>
<X/>
<C/> <!-- no match -->
<B>
<C/> <!-- match -->
</B>
</A>
I can point you to either C element and say "does it satisfy the XPath"?
If I want, I can just start from the C, look up for its parent B, and then
find A and so on. I can also work down from the root and see if I run
into the C. Let's say I had instead written the /A/B/C in a truly
imperative specification, Java-style:
result = new ListOfElements;
r = root();
// Go through all children of root
for (i=0; i<r.length(); i++) {
hopingForA = r.getNextChild();
if (hopingForA == "A")
// for all A children of root
for (j=0; j<hopingForA.length(); j++) {
hopingForB = hopingForA.getNextChild();
if (hopingForB == "B")
// for all elements matching /A/B
for (k=0; k<hopingForB.length(); k++) {
hopingForC = hopingForB.getNextChild();
if (hopingForC == "C")
// found an /A/B/C
result.append(hopingForC);
}
}
}
return result;
Now I again point you to one of the C elements in the instance and say "is
there a strategy you can use to test this?" All you can do is run the
program from the beginning and see if it adds the C you were looking for
to the result. You have to go step by step. You can indeed do some minor
optimizations around the loop indicies, but the fundamental algorithm is
specified step by step and it's very hard to get away from that. Perhaps
a truly brillant optimizer would recognize the equivalence to the
declarative specification of the path, but in the general case you're
unlikely to succeed at that.
XPath is not imperative, it's declarative. COBOL is imperative.
Noah
[1]
http://web.mit.edu/6.035/www/papers/BackusEtAl-FortranAutomaticCodingSystem-1957.pdf
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]