[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: We need an XPath API

From: Robin Berjon <robin@knowscape.com>
To: Charles Reitzel <creitzel@mediaone.net>
Date: Mon, 05 Mar 2001 21:33:12 +0100
First and foremost, Charles, thank you very much for your summary, I
believe it to be very useful.

At 10:47 05/03/2001 -0500, Charles Reitzel wrote:
>1) Give it the DOM treatment, rather than SAX treatment
>
>I don't see the need for callbacks at all at the expression level.  Better
>just to slurp up a string and be done with it.  XPath expressions appear as
>either attribute or element values. So even dealing with an InputSource
>seems unnecessary in an early version.

I do :-) But then from the feedback I've had, it seems that I'm the only
person to have the requirement of callbacks from XPath parsing so I'll
probably have to roll my own interface for that. Note though that I am
*not* interest in low-level callbacks but rather in a factory interface so
that I could build custom XPath objects.

There are several things that an XPath interface could provide. Mostly 1) a
way to look into and manipulate an XPath expression for various purposes
and 2) a way to query a DOM relative to a given XPath.

Two interesting methods I can think of for 2) are
$xpath->select_nodes($current_node) and $xpath->matches($node) (I'm using
Perl style syntax for my examples, but I think it should be
understandable). The above would presumably work on any DOM Node. However,
I'd like to see a factory interface in order to be able to create XPath
object optimized for different uses, for different tree models (potentially
non-DOM), etc...

>2) SAC as prior art from Simon St. Laurent and Robin Berjon
>
>Yes, clearly there is some conceptual overlap.  My reading of CSS1 and CSS2
>shows no references to XPath, however.  So, we have two independent W3C
>XPath-like syntaxes (hmmph).  Perhaps the big difference is the HTML legacy
>baggage.  It also seems that CSS2 is not used much compared to XSLT+XPath.

CSS isn't really an XPath-like syntax. Using XPath for CSS selectors was
brought up on www-style and rejected, for (imho) good reasons. The best
reason is obviously that it'd break compatibility accross CSS versions, but
that's of little interest here. It's important to see that CSS has
different requirements from XPath. It's selector model revolves entirely
around elements. You can test for other types of nodes, but what you select
is always elements, which is why it has pseudo-elements, classes and
pseudo-classes of elements, and so forth. It's intent is to apply style to
a document, not to return data to be processed.

Comparing CSS2 and XSLT+XPath is unfair, they cover different (though
overlapping) domains. Comparing CSS2 to XSL would be fairer.

The extent of the overlap that I believe would be useful to investigate in
SAC is it's nice Selector/Condition model. That can imho be directly
applied to XPath as Path/Predicate. It doesn't really cover Expr, so that's
where the interesting overlap stop (SAC does define a number of operators
that don't exist in CSS and could be used for XPath, but given that they
have never been used in the context of CSS, I don't know how useful they
could be).

>Emulating the SAC Selectors, Conditions and their respective factories looks
>good.  Perhaps we could even use these interfaces directly.  I'm wary of
>unwanted dependencies, however.

I'd go against using the interfaces directly. I started drafting an XPath
mapping of SAC yesterday just to see how far it goes and a  number of
things came out differently.

>To be clear, this is for XPath, not CSS.

I'd like to be able to construct a selector object totally abstracted from
the selector syntax. I can do that with SAC's factories and I'd like to be
able to do the same with the XPath interface, which is why I'm interested
in factories. Using a such selector object, I could have a) CSS/SAC,
XPath/<whatever> (XQuery, SQL ??) parsers using dedicated factories, b) a
single selector interface created by the previous factories and c) backend
SelectionDrivers for any kind of data that can be usefully queried. That
way if I have a tree model that isn't DOM (and for which an Adapter would
be inappropriate, perhaps for performance reasons) and I want to select
data from it using selector syntaxes that are already implemented, all I
need to implement is a SelectionDriver that understands what the selector
object requests from it.

Maybe this design is flawed somehow, and maybe I'm asking for too much.
That's on of the schemes I have in mind when I think about an XPath API.

>I think we may have a different use case here for CSS that is unlikely to
>apply to XPath.  When pulling in an entire CSS stylesheet, I can see the
>sense of the callback approach.

Again, I probably misexpressed myself when I mentionned callbacks. I don't
care about stream processing of XPath a la SAX because it's true that it's
unlikely one will have a 10Go XPath expression. What I'm interested in is
factories. SAC has those so that people may plug in their own
implementations easily (which may for instance extend CSS selectors beyond
the ones defined in the spec, I discussed such a scheme with Manos Batsis
from the human markup folks and it seems that it would make sense).

Also, I think a factory approach would make sense considering that XPath is
moving towards being able to use the PSVI. Perhaps other people would also
like to see XPath have the capacity to select against all sorts of other
infosets. To do so using factories, they would only need to define new
XPath tokens that are grammatically valid, a few factory classes, and voila
! an extended XPath that can deal specifically with their infoset. I
believe that would make XPath interestingly extensible.

>I don't know if this helps, but the QName doesn't need resolving until the
>the XPath expression is actually evaluated.  I.e. you can parse the
>expression, which would probably only include NS prefixes (or not).  At
>evaluation time, the NS URI for the prefix is a moving target.

I agree. I also guess one could simply specify a namespace map for a given
XPath in order to convert it's prefixes to namespace uris.

>I don't understand yet how an XPath expression can point to something "not
>neatly aligned on DOM Node boundaries".  To hazard a guess, is it be related
>to unexpanded external entities? In which case, "you can't get there from
>here" may be a reasonable answer from the library.

For instance, in DOM you can have consecutive text nodes. This usually
happens as a result of XML parsers returning text in several chunks. XPath
has no notion of that, if it selects the text() inside an element, I guess
that what is expected is that it returns a single text node.

Chances are, that text node isn't in the DOM (though its content is). For
people that want to use XPath to navigate (and edit) DOM documents, that
can be a serious caveat, because in effect the returned text node has
serious chances of being read only (well, you can write to it, but it won't
modify the tree).

That could be treated the way live NodeSets ought (imho as always) to be
treated. That is, they aren't live unless someone adds an extra layer to
make them so. One could add a layer that tracks changes to returned text
nodes and modifies the DOM accordingly.

The other option is to return as many text nodes as there are in the DOM,
but that could break various expectations (stuff relying on position for
instance).

>Iterator staleness is a problem w/ all query result sets.  I.e. the database
>row can get deleted out from under the cursor.  A set member can be deleted,
>leaving a dangling reference in a iterator.  There are no perfect solutions
>to this problems and developers all learn about it after they stub their
>toes a few times.

I think there is space for several options here. In some languages it's
likely that it would be more efficient to return a simple list. A long time
ago I worked on a prototype XPath compiler in Perl and using lists has
advantages given that you can translate paths to a series of map()s and
grep()s. However, for the sake of memory consumption (as well as other
considerations) one could return a NodeIterator or a TreeWalker as defined
in DOM2 (http://www.w3.org/TR/DOM-Level-2-Traversal-Range/traversal.html).

>Certainly, an XML syntax is not the highest priority.  If it gets at all
>controversial, better to scrap it.  (Ducking pie thrown by Jonathan Robie).

I'm not personally interested in an XML syntax for XPath, but I'd find an
XML syntax for selectors in general useful and interesting. Again, allowing
one to plug in factories (is this starting to sound like I'm obsessed ?)
would allow one to generate an XML string, fire SAX events, whatever.

>5) Need XPointer Support
>
>Doesn't XPointer just use XPath?  In which case, the lib should be able to
>do these things.  I guess this starts getting into XSLT and
>XPointer-specific extensions to XPath.  This probably calls for a couple
>SAX-style extension identifier URNs.  So an app can say "I need XSLT 1.1
>XPath extensions" and the parser can say yes or no.

Yes I think XPointer doesn't need to be built in, but it should be kept in
mind that the XPath API must play nice with other such needs.

Which reminds me, the API needs a way to add functions (if only for XSLT).
I think part of the recent discussion about xs:script and notably the xbind
proposal would be useful in this context (it would be useful to be able to
add functions easily, and to declare their signatures).

>I have also used Matt Seargent's XML::XPath module as well - with excellent
>results.  It's a real nice module.  It is also what triggered my original
>posting.

Yes it's a great module, and incidentally it's after I started wanting to
use it to query data structures other than it's own DOM implementation that
I got interested in an XPath API.

-- robin b.
In which level of metalanguage are you now speaking?
Follow-Ups:
- Re: We need an XPath API
  - From: Christian Nentwich <c.nentwich@cs.ucl.ac.uk>
Prev by Date: The devil is always in the detail...
Next by Date: Re: XML does not work with Netscape
Previous by thread: Re: We need an XPath API
Next by thread: Re: We need an XPath API
Index(es):
- Date
- Thread