OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] The limitations of XPath and navigation- A XPath/XQuery Challenge

Hi Mike,


Yes the SQL WHERE clause should have read: WHERE InvID=?Inv02?


With SQL-92, The Outer join?s ON clause, performs the joins. It is performed at each join point and only has affect over its join point- only downward, not upward. The WHERE clause is applied (logically) after all joins are performed similar to XQuery?s WHERE operation. This will change your XQuery example. It may increase LCA logic problems across documents. Is there any documentation which addresses processing across hierarchical paths?


Yes I did leave out the empCustId attribute, and your use is correct.


The SQL used in my SQL example is ANSI standard.


The SQL hierarchical prototype automatically determines the full hierarchical structure being processed and from that the default output XML structure produced from the processed result.


The default order of sibling nodes is automatically taken from the stored views and order of joins performed. The order of attributes in a node is determined by the order they were specified on the SL SELECT list. That was used because that?s how SQL generally operates and is very useful by default. This could be controlled easily by another FOR XML switch for retaining attribute order in a node.


As primarily an SQL product which is naturally navigationless, the default XML output structure should be automatically taken from the hierarchically processed relational rowset. This seems perfectly natural to me. If the user whishes to add their own format or transform they can do it directly in SQL, either by dynamically controlling what data is desired modifying the SELECT list or transforming the structure using SQL. So why preclude the automatic default feature.


On the debate of Implicit and Explicit Navigation: My original premise was that database data and markup need to be processed differently. With database data, the hierarchical structure and its processing needs to be more fixed and restricted in order to get the correct results. This allows database data to be performed navigationlessly and if this is possible, that should be the first choice. This also opens up XML access to the non technical user. It also has many advantages such as accuracy and not being affected by internal complexity, and all the other capabilities mentioned in my article.




-----Original Message-----
From: Michael Kay [mailto:mike@saxonica.com]
Sent: Tuesday, February 12, 2008 06:31 AM
To: mike@adatinc.com, john.snelson@oracle.com, james.fuller.2007@gmail.com
Cc: xml-dev@lists.xml.org
Subject: RE: [xml-dev] The limitations of XPath and navigation- A XPath/XQuery Challenge

This is the ANSI SQL multi-path query that will be translated into XQuery:

SELECT DpndID, CustID, EmpID, InvID, AddrID
FROM EmpView LEFT JOIN CustView ON EmpCustID=CustID
WHERE Invoice="Inv02"

The query description given here is the default SQL operation. The data fields that are desired in the hierarchical XML result are specified in the SELECT statement. The selected fields stay within their XML elements when output (attribute mode is the default). Node promotion will occur automatically around empty XML elements (with no data selected) in the output node. The FROM clause identifies the input objects and how they are related which is indicated on the ON clause. In this case the EmpView XML document is (left) joined over the Custview XML document linked by the EmcustID and CustID data values. The combined hierarchical structure is then filtered by the WHERE clause filter of Invoice='Inv02'. Referenced fields in the SQL query do not have to be selected for output.

MK>I'm struggling with this because I can't see what EmpCustID might be. I've come to the conclusion that you left out an attribute of the element and it should have been . If I got that wrong, sorry, please correct me.

Also you haven't explained in English what the query is supposed to do, so I'm having to rely on my very rusty knowledge of SQL to understand it. And I'm afraid I'm confused. I thought the idea of LEFT JOIN was that every EMP would appear in the result whether or not there was a matching CUST? But perhaps the WHERE clause overrides this - in which case it's surely the same as an ordinary join? Sorry, I'm really struggling to reverse-engineer the query specification from a language I don't fully understand.

When you write Invoice="Inv02" I would have expected to see Invoice.InvID="Inv02". Are you allowing a table name used in a comparison to be treated as a reference to its primary key?

Anyway, with your sample data you can get the supplied output using:

declare variable $emps := doc('employees.xml')/root/emp;
declare variable $custs := doc('customers.xml')/root/cust;

 for $e in $emps,
 $c in $custs[@custid = $e/@empcustid],
  $i in $c/invoice[@invid = "Inv02"]
  {$e/@*, $e/*}
  {$c/@*, $i, $c/addr}
} </root>

As you can see, this full hierarchical processing is being performed at a high hierarchical conceptual level and can be specified easily and interactively by non technical users. In XQuery terms, the FLOWR expression and output XML output template is being automatically created and performed accurately regardless of the internal processing complexity. So adding a new field in the SELECT list that will cause a new hierarchical path to also be accessed and complicate the hierarchical processing further is still all that is needed. Because the output structure is known, the proper data replication removal is performed.

MK> The main differences seem to be (a) that in the SQL version (I'm not actually clear now whether this is standard SQL or some variant of your own invention?) you are implicitly qualifying names appearing in the query, and implicitly adding the relationship between customer and invoice, based on knowledge of the schema, and (b) in the SQL version you are making guesses about the required output structure, for example that within the output element, the child should appear before <addr>.

Generally, my feeling looking at these is that the explicit construction of the result hierarchy in the XQuery solution feels the right thing to do; my gut feeling is that if the system were left to design its own hierarchy it would not usually produce what the user wants. On the other hand, I have rather mixed feelings about making the hierarchic paths to input data less explicit. It's true that it's very easy to make mistakes (I had several attempts at getting this right because I wrote $e/empcustid instead of $e/@emcpcustid). However, use of schema-aware query processing provides quick diagnosis of such errors (and without a schema, there's no chance of doing the implicit navigation anyway).

So the debate seems to boil down to a question of whether implicit navigation is better than explicit navigation: whether the system should guess that when I say invoiceId I must mean $c/invoices/invoice/invoiceId. I can see arguments both ways on that. XQuery of course allows you to say $c//invoiceId if you know it's unambiguous - but you do have to say that you're looking in the hierarchy rooted at $c.

Michael Kay

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS