[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Processing elements with a default or fixed value can be tricky ...here's an example that illustrates this
- From: Roger L Costello <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Fri, 27 May 2022 11:40:08 +0000
Hi Folks,
Scenario: There is an XML document containing a list of book data. Each book element has, among other things, the name of the author of the book. For some books there is no information about the name of the author. In those cases the author element is omitted and its value defaults to "UNKNOWN". Here is a sample XML document:
<Books>
<Book>
<Title>The Science of Programming</Title>
<Author>David Gries</Author>
</Book>
<Book>
<Title>Compiler Construction for Digital Computers</Title>
<Author>David Gries</Author>
</Book>
<Book>
<Title>The Emperor's New Mind</Title>
</Book>
<Book>
<Title>Algorithms</Title>
</Book>
<Book>
<Title>The Path to Power</Title>
<Author>Robert A. Caro</Author>
</Book>
</Books>
What we would like to know is:
How many distinct authors are there in the book list,
where the books with an UNKNOWN author are
counted as 1.
For the above XML document, the answer is: 3
Problem: Write an XPath expression that produces the correct answer.
You might be tempted to use this XPath expression:
count(distinct-values(/Books/Book/Author))
However, that is not correct. For the above XML document it returns: 2
Position your cursor on this Book element:
<Book>
<Title>The Emperor's New Mind</Title>
</Book>
And ask the question: How many Author elements are in the book?
This XPath expression:
count(Author)
returns 0
However, if we apply the string() function to the Author element:
count(string(Author))
we get this result: 1
Michael Kay explains the behavior difference this way:
> string() applied to an empty sequence returns the zero-length string,
> whereas atomization applied to an empty sequence returns an empty sequence.
In other words, string() applied to an empty sequence returns one value (a string which happens to have a length=0), whereas without applying the string() function there is no value.
Returning to our question:
How many distinct authors are there in the book list,
where the books with an UNKNOWN author are
counted as 1.
The appropriate XPath expression is now clear:
count(distinct-values(/Books/Book/string(Author)))
/Roger
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]