Re: [xml-dev] The limitations of XPath and navigation- A XPath/XQuery

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] The limitations of XPath and navigation- A XPath/XQuery Challenge
From: John Snelson <john.snelson@oracle.com>
To: mike@adatinc.com
Date: Tue, 12 Feb 2008 12:12:58 +0000
Hi Mike,

Looking at this example it's not surprising that SQL handles it well, 
since it uses a very relational style. I hope it's already clear that 
XQuery can easily handle the join that your query does.

The first obvious difference with your SQL query is that it's producing 
an automatically formatted result. I would suggest that users actually 
want their results in one of two ways, either:

1) In a specific output format that they know in advance.

2) As references into the actual XML tree that the results come from.

The latter allows users to further examine the structure of the original 
data in any direction. It seems to me that the automatic result tree 
produced by your query is a poor man's replacement for both of the above 
use cases - you aren't getting results in a specific known format, and 
you can't examine the structure of the original document having got 
those results. Of course, XQuery can easily handle both of these use cases.

The second obvious difference is that your SQL query is using what I 
assume you call lowest common ancestor processing to determine that when 
the user asks for DpndID and EmpID, what they actually want is an "emp" 
element, the lowest common ancestor that contains these attributes. This 
seems like a fairly big assumption to make, and apparently opens up the 
query to both false positives and false negatives - for instance when 
the name EmpID is used for attributes in more than one place in the 
document.

So I think your main point is that XQuery can't express your SQL query 
without explicitly selecting the common ancestor to use. In that case 
you would be right, but I'm not sure that's a bad thing.

John

mike@adatinc.com wrote:
> Here is the information and XML data for translating the ANSI SQL 
> hierarchical query to XQuery.
> 
>  
> 
>  From the comments I received overall, I can see that what I call full 
> nonlinear hierarchical processing (a query involving processing across 
> multiple paths of the hierarchical structure) is not well understood 
> today. This happened with the advent of relational processing. Before 
> there was relational processing, I was designing commercial immediate 
> response full hierarchical query products. The multiple pathways of the 
> hierarchical structures were generally referred to as legs and our 
> customers new how to utilize joining entire multiple structures (i.e. 
> joining  full XML documents) with a simple single join operation as you 
> will notice in the SQL below. The processing principles of full 
> hierarchical processing involving any number of legs were known and 
> followed automatically internally. The user access was navigationless as 
> is SQL.
> 
>  
> 
> Below are two simple XML documents, named EmpView and Custview.  In the 
> example in my article, I used the ANSI SQL statement shown below, which 
> I explain below it. The two XML document structures have been defined to 
> the SQL query as two standard SQL left outer views of the same name as 
> their corresponding XML documents. The SQL standard Left outer Join was 
> used to define each hierarchical structure in its entirety. These global 
> SQL hierarchical views carry no overhead, at execution; unneeded 
> pathways are automatically not accessed (left join hierarchical data 
> preservation enables this). This allows the query user to dynamically 
> pick and choose in any order which data fields are desired for output 
> without any other changes requires or incurring unnecessary overhead of 
> paths not needed.
> 
>  
> 
> *EmpView XML*
> 
> ** 
> 
> ***<root>*
> ***  <emp empid="Emp01">*
> ***   <dpnd dpndid="Dpnd01"/>*
> ***   <eaddr eaddrid="Addr01"/>*
> ***  </emp>*
> ***  <emp empid="Emp02">*
> ***   <eaddr eaddrid="Addr03"/>*
> ***  </emp>*
> *</root>*
> 
> ** 
> 
> ***CustView XML*
> 
> ** 
> 
> ***<root>**  *
> * <cust custid="Cust01">*
> ***   <invoice invid="Inv01"/>*
> ***   <invoice invid="Inv02"/>**   *
> *   <addr addrid="Addr01"/>**  *
> * </cust>**  *
> * <cust custid="Cust02">*
> ***   <invoice invid="Inv03"/>*
> ***   <addr addrid="Addr02"/>*
> ***   <addr addrid="Addr04"/>*
> *** </cust>*
> *** <cust custid="Cust03">*
> ***   <addr addrid="Addr03"/>*
> * </cust>*
> 
> *</root>*
> 
> ** 
> 
> This is the ANSI SQL multi-path query that will be translated into XQuery:
> 
>  
> 
> *SELECT DpndID, CustID, EmpID, InvID, AddrID*
> 
> 
>   FROM EmpView LEFT JOIN CustView  ON EmpCustID=CustID
> 
> *WHERE Invoice=”Inv02”*
> 
>  
> 
> The query description given here is the default SQL operation. The data 
> fields that are desired in the hierarchical XML result are specified in 
> the SELECT statement. The selected fields stay within their XML elements 
> when output (attribute mode is the default). Node promotion will occur 
> automatically around empty XML elements (with no data selected) in the 
> output node. The FROM clause identifies the input objects and how they 
> are related which is indicated on the ON clause. In this case the 
> EmpView XML document is (left) joined over the Custview XML document 
> linked by the EmcustID and CustID data values. The combined hierarchical 
> structure is then filtered by the WHERE clause filter of 
> Invoice=’Inv02’. Referenced fields in the SQL query do not have to be 
> selected for output.
> 
>  
> 
> There is no user navigation performed. The SQL hierarchical processor 
> automatically determines the combined structure being processed and the 
> output structure by analyzing the specific SQL query. If the EmpID field 
> is removed from the SELECT list, its EMP Element would be eliminated 
> from the output structure using node promotion. This dynamic change in 
> the output structure required no other changes than removing EmpID from 
> the SELECT list. The reverse is true for adding a field to the queries 
> select list, the field is added to the output adding a node correctly if 
> necessary.
> 
>  
> 
> Node promotion is a natural relational operation performed by relational 
> projection occurring in the underlying SQL processing. Other 
> hierarchical operations such as node collection and hierarchical node 
> preservation from the left join are also relationally supported.
> 
>  
> 
> As you can see, this full hierarchical processing is being performed at 
> a high hierarchical conceptual level and can be specified easily and 
> interactively by non technical users. In XQuery terms, the FLOWR 
> expression and output XML output template is being automatically created 
> and performed accurately regardless of the internal processing 
> complexity. So adding a new field in the SELECT list that will cause a 
> new hierarchical path to also be accessed and complicate the 
> hierarchical processing further is still all that is needed. Because the 
> output structure is known, the proper data replication removal is performed.
> 
>  
> 
> *Result XML*
> 
> ** 
> 
> ***<root>*
> *** <emp empid="Emp01">*
> ***  <dpnd dpndid="Dpnd01"/>*
> ***  <eaddr eaddrid="Addr01"/>*
> ***  <cust custid="Cust01">*
> ***   <invoice invid="Inv02"/>*
> ***   <addr addrid="Addr01"/>*
> ***  </cust>*
> *** </emp>*
> *</root>*
> 
> ** 
> 
> **This is just a simple example. But it may also answer the question 
> raised previously of why joining two nonlinear (multi-leg) structures 
> may be useful. A user has two different documents that they want to link 
> together hierarchically in a particular way and perform processing that 
> involves many different legs in both structures and produce a new full 
> nonlinear (multi-leg structure). Using the ANSI SQL hierarchical 
> prototype this navigational intensive problem becomes simple when 
> performed navigationlessly.
> 
>  
> 
> You have the two XML input documents above and the description of what 
> processing the SQL hierarchical query performed. You can compare your 
> XQuery result to the output XML.
> 
>  
> 
> I am not that familiar with the hierarchical nested row sets used by 
> XQuery perform. So getting the correct hierarchical processing may fall 
> into place or require additional coordination between the processed legs 
> (paths) to keep the results meaningful. Let me know if you need more 
> information or have any other questions.
> 
>  
> 
>             /Mike
> 
>  
> 


-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:        http://www.oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net
References:
- Re: [xml-dev] The limitations of XPath and navigation- A XPath/XQuery Challenge
  - From: mike@adatinc.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]