xml-dev - Re: [xml-dev] Question for the XPath and DOM folks

Re: [xml-dev] Question for the XPath and DOM folks

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Question for the XPath and DOM folks
From: Jeni Tennison <jeni@jenitennison.com>
Date: Mon, 22 Jul 2002 15:40:15 +0100
In-reply-to: <031801c2302d$0d15f6c0$c000a8c0@salutia>
Organization: Jeni Tennison Consulting Ltd
References: <E17W0t6-00066X-00@malatesta.local><031801c2302d$0d15f6c0$c000a8c0@salutia>
Reply-to: Jeni Tennison <jeni@jenitennison.com>

Garland Foster wrote:
> And how would you fix it? You have to:
> a) Respect the Xpath spec so you can't return 3 TEXT nodes
> b) Respect the DOM spec so you have 3 text nodes in the tree
> So you have 3 nodes and you must return only one of them.
> No way out. No way out.
>
> "An Xpath query applied to a DOM tree can return a different result
> than an Xpath query applied directly to an XML document"

A radical (very probably too radical) solution would be to decouple
the XPath expression language from the data model that it uses. Rather
than specifying an XPath data model, we could say that XPath is an
expression language for querying into the data model used by its host
language, and that it is then up to the host language to define an
appropriate data model, and to define the axes and node tests are
useful for accessing information in that data model. (There would then
be no concept of using an XPath query "directly" on an XML document,
but it would be feasible for an XPath query applied to the DOM tree
created from an XML document to give a different result from the same
XPath query applied to the XSLT tree created from the same XML
document.)

The DOM data model could then retain CDATA sections and entity
references as distinct node types, and retain the view that CDATA
sections are a type of text node. With the XML document:

<foo>
bar 
<![CDATA[
baz 
]]>
quux
</foo>

an XPath-in-DOM like:

  /foo/text()

would return a text node, a CDATA section node, and another text node.
An XPath-in-DOM like:

  /foo/cdata-section()

would return a CDATA section.

I can see the argument that this would mean a difference between using
an XPath in the DOM and using an XPath in XSLT or XQuery, and that
this would be a Bad Thing. And it would certainly be great if all
these standards used the same data model such that the same XPath
meant the same thing in all of them. I think that would only be
tenable, though, if the features in DOM that aren't in the XPath data
model (i.e. CDATA sections and entity references) weren't actually
being used in DOM anyway. I don't know whether that's the case, but I
imagine that they were included in DOM for a reason and that for some
people (I'm thinking particularly editor implementers) they're really
useful. It just seems a shame not to allow people to use the nice
XPath syntax to access all the information in their DOM tree.

Cheers,

Jeni

P.S. I find the idea of XPath working over several different data
models appealing because it ties into something that I've been playing
with recently: an XPath-like query language for "the Layered Markup
and Annotation Language" (LMNL). I'm going to be talking about LMNL at
Extreme, but basically it uses a data model based on Gavin Nicol's
Core Range Algebra (also at Extreme) that views a document as a
sequence of characters plus any number of named "ranges" over those
characters, thus dealing nicely with overlapping constructs such as:

   o v e r l a p s   a r e   a l l o w e d
  +--------italic---------+
                    +--------bold---------+

Of course the meaning of axes like "descendant" and particularly
"parent" and "child" is a bit different in a non-tree-based data
model, but it's still possible to come up with reasonable definitions
such as "a descendant of a range is any range that starts and ends
within the range", and thus it's still possible to construct XPath
expressions that operate over this data model to pull out useful
information.

Using Jaxen, it's actually really easy to adapt XPath to a different
data model -- that's what Jaxen's built for, after all -- though
unfortunately you still have to use the same set of axes and node
types...

---
Jeni Tennison
http://www.jenitennison.com/

References:
- Re: [xml-dev] Question for the XPath and DOM folks
  - From: Uche Ogbuji <uche.ogbuji@fourthought.com>
- Re: [xml-dev] Question for the XPath and DOM folks
  - From: "Garland foster" <garland_foster@salutia.com>

Prev by Date: RE: [xml-dev] URIs harmful (was RE: [xml-dev] Article: Keeping pace with James Clark)
Next by Date: Re: [xml-dev] URIs harmful (was RE: [xml-dev] Article: Keeping pa ce with James Clark)
Previous by thread: Re: [xml-dev] Question for the XPath and DOM folks
Next by thread: RE: [xml-dev] Question for the XPath and DOM folks
Index(es):
- Date
- Thread