XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] The limitations of XPath and navigation- A XPath/XQuery Challenge

Hi John,

 

The ANSI SQL hierarchical processor is foremost a SQL processor for SQL users used to also process native XML and produce the hierarchically correct result in XML. Its look and feel tries to stay as much like SQL as possible. As a default XML output, the hierarchically processed result set is converted to hierarchically structured XML representing SQL’s look and feel and external rowset which naturally resembles the hierarchical structure processed. XQuery and SQL/XML used in SQL is not seamless and introduces navigation. SQL/XML functions require embedding and/or sub selects to format even the structure that was used in the processing. 

 

I will not go into all the FOR XML switches the SQL hierarchical XML product has, but there is a switch to force the default XML output overriding of node promotion to include all the intervening empty nodes. I think this satisfies one of your output suggestions. This will also allow the structure of the input structure to be examined unless it is joined to another structure in which case the total processed structure is output which is what would be expected. So the default output is condensed and modeled after the hierarchical processing specified, this is very well suited for interactive use. Replicated data is always properly removed when necessary.

 

Once the SQL hierarchical processor product was up and running, I played around with SQL’s ability to isolate and processes different pieces of the working set in memory and was able to perform any-to-any structure transformations using data relationships (restructuring) or just the structure semantics (reshaping). I know these terms are used interchangeably but I think their names represent different types of transforms. The semantic reshaping can also be coded to perform polymorphic transforms so that different input structures could be input and transformed to the same output structure. These transforms support full hierarchical structures in and out and maintain their hierarchical accuracy. These transforms change the structure dynamically which is fully tracked at all times.  This may satisfy your second suggestion.

 

I believe your understanding of the LCA and its use in incorrect. The Lowest common ancestor node is the lowest ancestor node to any other two nodes in different pathways (legs). The LCA’s current node data occurrence establishes the meaningfully range of node data occurrences of the two nodes under it. This set of node occurrences is meaningfully related for operations involving the two nodes. The DpndID attribute is located directly under the EmpId attribute. They are on the same pathway so there is no need for LCA logic. DpndID and EaddrID are on different pathways and their LCA is the EMP node. So for each data occurrence of the EMP node, all of the data occurrences for the Dpnd and Eaddr under the current Emp node data occurrence are meaningfully related and are processed together. For example,  in WHERE DpndID=1 and EaddrID=2, this is the range of combinations that will be tested. Including data values outside of this range are not meaningful and will produce invalid results. Another use of LCA is in the SELECT operation. This is not as clear in XQuery but should still apply for processing. In SQL, when SELECT DpndID WHERE  EaddrID=2 is true, all occurrences of DpndID under the current occurrence EMP data node occurrence are selected for output.

 

The above description of LCA processing applies for standard hierarchical processing required for database data use. Duplicate database data types in the document should not be allowed. They should be renamed or always fully qualified. They can cause variable LCAs and database data should not allow this. This is OK for markup data, not database data. Just imagine performing an aggregation for record sales on “SALES” and a missing record data causes “SALES” of book sales to be added in. This is what you would want for markup use, but not database data use.

 

    /Mike

 
-----Original Message-----
From: John Snelson [mailto:john.snelson@oracle.com]
Sent: Tuesday, February 12, 2008 07:12 AM
To: mike@adatinc.com
Cc: mike@saxonica.com, james.fuller.2007@gmail.com, xml-dev@lists.xml.org
Subject: Re: [xml-dev] The limitations of XPath and navigation- A XPath/XQuery Challenge

Hi Mike,

Looking at this example it's not surprising that SQL handles it well, since it uses a very relational style. I hope it's already clear that XQuery can easily handle the join that your query does.

The first obvious difference with your SQL query is that it's producing an automatically formatted result. I would suggest that users actually want their results in one of two ways, either:

1) In a specific output format that they know in advance.

2) As references into the actual XML tree that the results come from.

The latter allows users to further examine the structure of the original data in any direction. It seems to me that the automatic result tree produced by your query is a poor man's replacement for both of the above use cases - you aren't getting results in a specific known format, and you can't examine the structure of the original document having got those results. Of course, XQuery can easily handle both of these use cases.

The second obvious difference is that your SQL query is using what I assume you call lowest common ancestor processing to determine that when the user asks for DpndID and EmpID, what they actually want is an "emp" element, the lowest common ancestor that contains these attributes. This seems like a fairly big assumption to make, and apparently opens up the query to both false positives and false negatives - for instance when the name EmpID is used for attributes in more than one place in the document.

So I think your main point is that XQuery can't express your SQL query without explicitly selecting the common ancestor to use. In that case you would be right, but I'm not sure that's a bad thing.

John



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS