XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Creating a "complete" specification is hard

Hi Folks,

 

Yesterday I posted a message asking:

 

With XML documents like this:

 

<Books>

    <Book>

        <Title>The Science of Programming</Title>

        <Author>David Gries</Author>

    </Book>

    <Book>

        <Title>Compiler Construction for Digital Computers</Title>

        <Author>David Gries</Author>

    </Book>

    <Book>

        <Title>The Emperor's New Mind</Title>

    </Book>

    <Book>

        <Title>Algorithms</Title>

    </Book>

    <Book>

        <Title>The Path to Power</Title>

        <Author>Robert A. Caro</Author>

    </Book>

</Books>

 

How can we answer this question:

 

              How many distinct authors are there in the book list,

              where the books with an UNKNOWN author are

              counted as 1?

 

It was pointed out to me that my specification is ambiguous:

 

  • Should a Book element with an empty Author element and a Book element with an absent Author element be treated as equivalent or as distinct?
  • Should Book elements with no Author be counted together as a single value or separately as distinct values (because undefined)?

 

Wow!

 

Lesson Learned: Creating a specification that considers all cases is hard.

 

Let’s extend the sample XML document to include a Book element with an empty Author element:

 

<Books>

    <Book>

        <Title>The Science of Programming</Title>

        <Author>David Gries</Author>

    </Book>

    <Book>

        <Title>Compiler Construction for Digital Computers</Title>

        <Author>David Gries</Author>

    </Book>

    <Book>

        <Title>The Emperor's New Mind</Title>

    </Book>

    <Book>

        <Title>Algorithms</Title>

    </Book>

    <Book>

        <Title>The Path to Power</Title>

        <Author>Robert A. Caro</Author>

    </Book>

    <Book>

        <Title>Constraint Programming Languages</Title>

        <Author></Author>

    </Book>

</Books>

 

Here are XPath expressions to handle each case:

 

Treat a Book element with an empty Author element and a Book element with an absent Author element as equivalent:

count(distinct-values(/Books/Book/string(Author)))

 

Evaluating that XPath expression on the XML document yields: 3

 

Treat a Book element with an empty Author element and a Book element with an absent Author element as distinct:

let $book := /Books/Book

  return

    count(distinct-values($book/Author)) + xs:integer(exists($book[not(Author)]))

 

Evaluating that XPath expression on the XML document yields: 4

 

Treat each Book element with an absent Author element as distinct:

 

let $book := /Books/Book

  return

    count(distinct-values($book/Author)) + count($book[not(Author)])

 

Evaluating that XPath expression on the XML document yields: 5

 

Acknowledgements

 

Thank you to the following people for their tremendous help and insights:

 

  • Michael Kay
  • Dimitre Novatchev
  • Wendell Piez

 

/Roger

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS