XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Processing elements with a default or fixed value can be tricky ...here's an example that illustrates this

Hi Folks,

Scenario: There is an XML document containing a list of book data. Each book element has, among other things, the name of the author of the book. For some books there is no information about the name of the author. In those cases the author element is omitted and its value defaults to "UNKNOWN". Here is a sample XML document:

<Books>
    <Book>
        <Title>The Science of Programming</Title>
        <Author>David Gries</Author>
    </Book>
    <Book>
        <Title>Compiler Construction for Digital Computers</Title>
        <Author>David Gries</Author>
    </Book>
    <Book>
        <Title>The Emperor's New Mind</Title>
    </Book>
    <Book>
        <Title>Algorithms</Title>
    </Book>
    <Book>
        <Title>The Path to Power</Title>
        <Author>Robert A. Caro</Author>
    </Book>
</Books>

What we would like to know is: 

  	How many distinct authors are there in the book list, 
	where the books with an UNKNOWN author are 
	counted as 1.

For the above XML document, the answer is: 3 

Problem: Write an XPath expression that produces the correct answer.

You might be tempted to use this XPath expression:

	count(distinct-values(/Books/Book/Author))

However, that is not correct. For the above XML document it returns: 2

Position your cursor on this Book element:

    <Book>
        <Title>The Emperor's New Mind</Title>
    </Book>

And ask the question: How many Author elements are in the book?

This XPath expression: 

	count(Author)

returns 0

However, if we apply the string() function to the Author element:

	count(string(Author))

we get this result: 1

Michael Kay explains the behavior difference this way:

> string() applied to an empty sequence returns the zero-length string, 
> whereas atomization applied to an empty sequence returns an empty sequence.

In other words, string() applied to an empty sequence returns one value (a string which happens to have a length=0), whereas without applying the string() function there is no value.

Returning to our question:

  	How many distinct authors are there in the book list, 
	where the books with an UNKNOWN author are 
	counted as 1.

The appropriate XPath expression is now clear:

count(distinct-values(/Books/Book/string(Author)))

/Roger


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS