Re: [xml-dev] Be careful writing XPath expressions when the XML couldhav

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] Be careful writing XPath expressions when the XML couldhave non-existent elements

From: Wendell Piez <wapiez@wendellpiez.com>
To: "Costello, Roger L." <costello@mitre.org>
Date: Wed, 30 Jul 2014 10:49:57 -0400

Hi,

Roger writes:
> Hi Folks,
>
> Are you writing XPath expressions?
>
> XPath is embedded in a lot of things: XQuery, Schematron, XSLT, XML Schema. If you are using XML, there is a good chance you are writing XPath expressions.
>
> Heads up!
>
> Failure to take into account the following rule will result in countless headaches and hard-to-detect bugs.
>
>         Rule: The result of evaluating an XPath expression
>         that compares a non-existent element against
>         anything is always false.
>
> More ... http://xfront.com/Be-Careful-Writing-XPath-Expressions-Against-XML-Documents-that-may-have-Non-Existent-Elements.pdf

Okay, but how is this a bug not a feature?

I concur that if your code is bad, you will get bugs, and if you don't
know how the language works, you are more likely to write bad code.

But in the general case, I want select //div[@class='section'] to
return every div that has a 'class' attribute whose value is
'section', and none whose @class is something else, or not there at
all. So "@class='section'" returns false() when @class returns an
empty sequence (or node set in XPath 1.0.)

What do you want it to return: a run-time error?

I'd prefer to see the problem of a missing @class on a div -- if it is
a problem -- caught upstream (we have schema validation to help with
such things). It's much better if the point of failure is near the
point of control.

Similarly, the situation you spell out in your paper is created by not
respecting the following constraint -- which you have imposed on
yourself: "If Genre is missing, the default is 'non-fiction'".

This means that an XPath to retrieve non-fiction books should look like this:

Book[Genre = 'non-fiction' or empty(Genre)]

or maybe (if you want to be fancy and reduce to a single test)
Book[empty(Genre[. ne 'non-fiction'])]

and so forth.

Alternatively -- and frequently better, in larger systems with lots of
such rules -- provide your default values explicitly in a
normalization pre-process, so your XSLT can see them.

It's very true that the relatively fault-tolerant semantics of XPath
logic means that using it requires both care, and a clear
understanding of the inputs -- including any formal constraints (or
lack thereof) that come with their definition. For example -- what
should happen with <Genre>   non-fiction   </Genre> ... should we
allow this to mean the same thing as <Genre>non-fiction</Genre> ? You
decide.

But that's totally normal, and if you figure you can design a querying
technology that doesn't assume such understanding on the part of the
user, and yet which works well in the real world -- I'd like to see
it. :-)

Cheers, Wendell

-- 
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Follow-Ups:
- Re: [xml-dev] Be careful writing XPath expressions when the XML couldhave non-existent elements
  - From: John Cowan <johnwcowan@gmail.com>

References:
- Be careful writing XPath expressions when the XML could havenon-existent elements
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]