[
Lists Home |
Date Index |
Thread Index
]
- To: Michael Kay <mike@saxonica.com>
- Subject: Re: [xml-dev] [XPath] is it legal ?
- From: Philippe Poulard <Philippe.Poulard@sophia.inria.fr>
- Date: Fri, 07 Apr 2006 17:08:25 +0200
- Cc: xml-dev@lists.xml.org
- In-reply-to: <200604071216.k37CGoqd030079@sophia.inria.fr>
- References: <200604071216.k37CGoqd030079@sophia.inria.fr>
- User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050511
Michael Kay wrote:
>
> I assume we're talking about XPath path expressions here, not XSLT match
> patterns: so it might be better to use the verb "select" rather than
> "match".
Sure ; I'm working on "pattern matching", so I have been somewhat
influenced...
About pattern matching, I'm trying to fix some issues with Jaxen :
considering the following document :
<root> <a/><!-- --><b/><?pi ip?><b foo='bar'/></root>
and the node testing is the last "b" element, I get the following result :
matched? pattern
-------- -------
false a
true b
true root/b
false a/b
true /root/b
true /root/b[@foo]
false /root/b[@bar]
true /root/b[1]
false /root/b[2]
true /root/*[1]
false /root/*[2]
false /root/*[3]
true /root/node()[1]
true /root/node()[5]
true /root/node()[6]
I will report a bug to Jaxen's team.
When XPath expressions are applied, Jaxen works fine, but patterns fail
when the position is involved within a predicate : Jaxen consider that
the first item matches (except in conjunction with node() that always
matches) because Jaxen creates a context that contains only the node to
test (when a pattern is considered), or the nodes selected in the
previous step (when an XPath expression is considered, which is ok)
As the entry point of a pattern is a single node, I have rewritten some
classes in order to consider the absolute position of the node, which
may vary according to the matcher used in the step ; here are my result :
false /root/b[1]
true /root/b[2]
false /root/*[1]
false /root/*[2]
true /root/*[3]
false /root/node()[1]
false /root/node()[5]
true /root/node()[6]
(I will test later the case "prefix:*")
Jaxen provides many navigators, and I plan to make a tree walker for
SAX, to use both with XPath patterns and XPath expressions.
What all that stuff makes me think about, is that a tiny amount of
informations worth to be kept : the ancestors with their attributes and
namespace declarations, and their 4 kind of positions :
-the absolute index (when node() is involved)
-the absolute index in the type (*, @*, text(), comment(), pi(), ns::*)
-the absolute index in the "family" (p:*, @p:*)
-the absolute index in the same name (foo, p:foo, @bar, @p:bar,
pi(target), ns::p)
(The position of the document is 1)
Now, we can test any of the above patterns with SAX.
To go on with the idea, how to test with SAX such pattern :
/root/*[last()]
?
It appears that what is needed, is :
-also 4 kind of sizes
-a mean to read forward SAX events
To achieve this, I intend to write a cache that could store some events
(limited to 100 or 1000 or whatever you set as a default parameter) ;
thus, when a size is requested, the engine goes on reading the input
until the information is known, then the step is evaluated and later,
the events stored will be fetched.
This is a smart strategy because it is not limited to count(), but to
any operations that expect more reading, thus a predicate that contains
following-sibling:: may also be considered. The idea is to use the cache
only when it is explicitely expected (putting all a document in a cache
wouldn't be SAX, but DOM). The events could also be cached in a tree
fragment, I don't know yet what is the best way to achieve this.
Of course there are examples when the information expected is not
reachable in the limit of the cache size, or lost because it has been
previously read, but it will help in many other examples.
As I have some code that allows to pour SAX events into DOM trees, I'll
provide a smart mean to match a pattern on the SAX entry, and process
the subtree with full XPath capabilities ; this might be very helpfull
for very large documents.
What do you think of such a strategy ?
Did you made something similar in Saxon ?
>>
>>Will you be hurted if someone (like me) was writting
>>something like this :
>> $foo/@bar/@oof
>>
>
> If your data conforms to the XPath data model, then this will select
> nothing. If you want to design a language similar to XPath that works on a
> different data model, for example one in which attributes can have
> attributes, then you are free to do so (but please don't call it XPath).
>
> Michael Kay
> http://www.saxonica.com/
>
I don't want to design a language similar to XPath that works on a
different data model, but rather design a data model different to
XPath's on which XPath can be applied :)
Why couldn't I design a data model where the value of an attribute is
not a string but any object ?
For the moment, there are few things that I bypassed in XPath :
-it is said somewhere that :
-any object can be bound to a variable.
-the result of an XPath expression is either a (possible empty) node
set, a boolean, a string, or a number.
If my XPath expression is $foo, why is it not allowed to get the
original object ? Thus, I consider that it is possible to get an object.
How Saxon react when $foo refer to an object ? I also hope that this is
fixed in XPath 2.0.
-comparisons on comparable objects are not done regarding the XPath
rules, but on their comparator ; this is also the same when a typed data
is bound to a node (I know there are similar mechanisms in XPath 2.0.),
like shown in this example :
http://reflex.gforge.inria.fr/tutorial.html#N8013D5
...which works like PSVI on W3C XML Schema, but I don't think that
the type in this example can be defined with WXS (in order to sort it in
the same way).
Now, let's consider again the question about objects contained in
attribute values : with PSVI, it is possible to bind a typed data with
it. So, I can consider that the attribute value is a (textual) reference
that will be parsed by my type library that will supply the object
expected. We like to be contortionists in computer science, but I prefer
a straightforward way to get a result almost identical (without needing
to design a useless string value as a reference to my object). This is
like setting anonymous objects : if I don't care about the name (the
textual reference), why forcing me to use one ?
I have experimented several objects with XPath, and I find the idea of
storing them in attributes very usefull (thus, the host object can't be
en element because an element can only store attributes which values are
strings) ; it could also be the same with typed datas after PSVI
processing : for example, by exposing its facets as some WXS attributes.
Active Schema is designed like that.
--
Cordialement,
///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
|