Re: [xml-dev] The limitations of XPath and navigation for XML da

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] The limitations of XPath and navigation for XML database processing

From: noah_mendelsohn@us.ibm.com
To: mike@adatinc.com
Date: Thu, 7 Feb 2008 20:33:11 -0500

Michael David writes:

> Your example of XPath /A/B/C being able to be processed 
> internally in any way that is still semantically correct to 
> optimize it could be applied to COBOL and I would not consider 
> COBOL  nonprocedural. 

We certainly agree that COBOL is imperative and not declarative, but I 
don't at all agree that COBOL and XPath are similar in this respect.  It's 
been just a bit over 30 years since I wrote any COBOL, but from my vague 
recollection the following might be pretty close to valid COBOL:

        DIVIDE INCOME BY UNITS-SOLD GIVING INCOME-PER-UNIT.
        ADD INCOME-PER-UNIT TO PROFIT-PER-UNIT GIVING NET-PROFIT.
        DISPLAY NET-PROFIT.

(for those few youngsters on this list who's CS education didn't include 
COBOL: yes, the programs really read like this)

Crucially, the COBOL steps have to be done as if in order.   If the first 
step results in a divide by zero, then you don't perform the second step. 
It's an imperative language -- do this step then this one.  It's certainly 
true that you can do a lot of optimizations under the cover.   For 
example, if you had a hardware system with some parallelism you could 
start fetching PROFIT-PER-UNIT from memory even before trying the divide, 
but you certainly couldn't do anything that would make it appear that the 
second or third steps had been performed if the first one faulted.

The XPath /A/B/C is very different.  It doesn't in any way say "If you 
have trouble while finding an A, then you can't have started looking for a 
B or a C".  It just says:  "All Cs with parents that are Bs that in turn 
have A parents that are immediate children of the root."  No sense of 
"first this step then that step".

It's a very deep difference.  Nobody has ever said you can't do lots of 
optimizations on procedural languages.  The first FORTAN compiler for the 
IBM 704 [1] was a tour-de-force of optimization for what is clearly an 
imperative, procedural language.   As MK has said quite eloquently, 
though, declarative languages like SQL and XPath much more purely separate 
the specification of what is desired from any notion of sequentiality in 
making that specification.  COBOL clearly says which steps to do first; 
XPath doesn't have steps in time sequence, just a specification of the 
desired result.

Here's another way to look at it.  Let's say I have this XML instance:

<A>
  <X/>
  <C/> <!-- no match -->
  <B>
   <C/> <!-- match -->
  </B>
</A>

I can point you to either C element and say "does it satisfy the XPath"? 
If I want, I can just start from the C, look up for its parent B, and then 
find A and so on.  I can also work down from the root and see if I run 
into the C.  Let's say I had instead written the /A/B/C in a truly 
imperative specification, Java-style:

        result = new ListOfElements;
        r = root();
      // Go through all children of root
        for (i=0; i<r.length(); i++) {
        hopingForA = r.getNextChild();
        if (hopingForA == "A")
         // for all A children of root
         for (j=0; j<hopingForA.length(); j++) {
           hopingForB = hopingForA.getNextChild();
             if (hopingForB == "B")
             // for all elements matching /A/B
             for (k=0; k<hopingForB.length(); k++) {
               hopingForC = hopingForB.getNextChild();
                 if (hopingForC == "C")
                    // found an /A/B/C
                    result.append(hopingForC);
             }

         }
        }
      return result;

Now I again point you to one of the C elements in the instance and say "is 
there a strategy you can use to test this?"  All you can do is run the 
program from the beginning and see if it adds the C you were looking for 
to the result.  You have to go step by step.  You can indeed do some minor 
optimizations around the loop indicies, but the fundamental algorithm is 
specified step by step and it's very hard to get away from that.  Perhaps 
a truly brillant optimizer would recognize the equivalence to the 
declarative specification of the path, but in the general case you're 
unlikely to succeed at that.

XPath is not imperative, it's declarative.  COBOL is imperative.

Noah

[1] 
http://web.mit.edu/6.035/www/papers/BackusEtAl-FortranAutomaticCodingSystem-1957.pdf
--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

References:
- Re: [xml-dev] The limitations of XPath and navigation for XML database processing
  - From: mike@adatinc.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]