OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] [XPath] is it legal ?

[ Lists Home | Date Index | Thread Index ]
  • To: Michael Kay <mike@saxonica.com>
  • Subject: Re: [xml-dev] [XPath] is it legal ?
  • From: Philippe Poulard <Philippe.Poulard@sophia.inria.fr>
  • Date: Fri, 07 Apr 2006 17:08:25 +0200
  • Cc: xml-dev@lists.xml.org
  • In-reply-to: <200604071216.k37CGoqd030079@sophia.inria.fr>
  • References: <200604071216.k37CGoqd030079@sophia.inria.fr>
  • User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050511

Michael Kay wrote:
> I assume we're talking about XPath path expressions here, not XSLT match
> patterns: so it might be better to use the verb "select" rather than
> "match".

Sure ; I'm working on "pattern matching", so I have been somewhat 

About pattern matching, I'm trying to fix some issues with Jaxen :
considering the following document :
<root>   <a/><!-- --><b/><?pi ip?><b foo='bar'/></root>
and the node testing is the last "b" element, I get the following result :
matched? pattern
-------- -------
false    a
true     b
true     root/b
false    a/b
true     /root/b
true     /root/b[@foo]
false    /root/b[@bar]
true     /root/b[1]
false    /root/b[2]
true     /root/*[1]
false    /root/*[2]
false    /root/*[3]
true     /root/node()[1]
true     /root/node()[5]
true     /root/node()[6]

I will report a bug to Jaxen's team.
When XPath expressions are applied, Jaxen works fine, but patterns fail 
when the position is involved within a predicate : Jaxen consider that 
the first item matches (except in conjunction with node() that always 
matches) because Jaxen creates a context that contains only the node to 
test (when a pattern is considered), or the nodes selected in the 
previous step (when an XPath expression is considered, which is ok)
As the entry point of a pattern is a single node, I have rewritten some 
classes in order to consider the absolute position of the node, which 
may vary according to the matcher used in the step ; here are my result :

false  /root/b[1]
true   /root/b[2]
false  /root/*[1]
false  /root/*[2]
true   /root/*[3]
false  /root/node()[1]
false  /root/node()[5]
true   /root/node()[6]
(I will test later the case "prefix:*")

Jaxen provides many navigators, and I plan to make a tree walker for 
SAX, to use both with XPath patterns and XPath expressions.
What all that stuff makes me think about, is that a tiny amount of 
informations worth to be kept : the ancestors with their attributes and 
namespace declarations, and their 4 kind of positions :
-the absolute index (when node() is involved)
-the absolute index in the type (*, @*, text(), comment(), pi(), ns::*)
-the absolute index in the "family" (p:*, @p:*)
-the absolute index in the same name (foo, p:foo, @bar, @p:bar, 
pi(target), ns::p)
(The position of the document is 1)

Now, we can test any of the above patterns with SAX.

To go on with the idea, how to test with SAX such pattern :

It appears that what is needed, is :
-also 4 kind of sizes
-a mean to read forward SAX events
To achieve this, I intend to write a cache that could store some events 
(limited to 100 or 1000 or whatever you set as a default parameter) ; 
thus, when a size is requested, the engine goes on reading the input 
until the information is known, then the step is evaluated and later, 
the events stored will be fetched.
This is a smart strategy because it is not limited to count(), but to 
any operations that expect more reading, thus a predicate that contains 
following-sibling:: may also be considered. The idea is to use the cache 
only when it is explicitely expected (putting all a document in a cache 
wouldn't be SAX, but DOM). The events could also be cached in a tree 
fragment, I don't know yet what is the best way to achieve this.

Of course there are examples when the information expected is not 
reachable in the limit of the cache size, or lost because it has been 
previously read, but it will help in many other examples.
As I have some code that allows to pour SAX events into DOM trees, I'll 
provide a smart mean to match a pattern on the SAX entry, and process 
the subtree with full XPath capabilities ; this might be very helpfull 
for very large documents.

What do you think of such a strategy ?
Did you made something similar in Saxon ?

>>Will you be hurted if someone (like me) was writting 
>>something like this :
>>     $foo/@bar/@oof
> If your data conforms to the XPath data model, then this will select
> nothing. If you want to design a language similar to XPath that works on a
> different data model, for example one in which attributes can have
> attributes, then you are free to do so (but please don't call it XPath).
> Michael Kay
> http://www.saxonica.com/

I don't want to design a language similar to XPath that works on a
different data model, but rather design a data model different to 
XPath's on which XPath can be applied :)

Why couldn't I design a data model where the value of an attribute is 
not a string but any object ?

For the moment, there are few things that I bypassed in XPath :
-it is said somewhere that :
   -any object can be bound to a variable.
   -the result of an XPath expression is either a (possible empty) node 
set, a boolean, a string, or a number.
   If my XPath expression is $foo, why is it not allowed to get the 
original object ? Thus, I consider that it is possible to get an object. 
How Saxon react when $foo refer to an object ? I also hope that this is 
fixed in XPath 2.0.
-comparisons on comparable objects are not done regarding the XPath 
rules, but on their comparator ; this is also the same when a typed data 
is bound to a node (I know there are similar mechanisms in XPath 2.0.), 
like shown in this example :
     ...which works like PSVI on W3C XML Schema, but I don't think that 
the type in this example can be defined with WXS (in order to sort it in 
the same way).

Now, let's consider again the question about objects contained in 
attribute values : with PSVI, it is possible to bind a typed data with 
it. So, I can consider that the attribute value is a (textual) reference 
that will be parsed by my type library that will supply the object 
expected. We like to be contortionists in computer science, but I prefer 
a straightforward way to get a result almost identical (without needing 
to design a useless string value as a reference to my object). This is 
like setting anonymous objects : if I don't care about the name (the 
textual reference), why forcing me to use one ?
I have experimented several objects with XPath, and I find the idea of 
storing them in attributes very usefull (thus, the host object can't be 
en element because an element can only store attributes which values are 
strings) ; it could also be the same with typed datas after PSVI 
processing : for example, by exposing its facets as some WXS attributes. 
Active Schema is designed like that.

              (. .)
|      Philippe Poulard       |
        Have the RefleX !


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS