Hi Folks, I recently came across this statement in a requirement's document: Verify using XPath that there are no elements or attributes
in an XML document with more than X amount of characters. That seemingly simple requirement has ambiguities that senders and receivers must wrestle with. Here are the ambiguities: 1. Does "X amount of characters" include whitespace? 2. Can the XML contain mixed content? Consider this XML document: <root> <child>0123 <grandchild>abc</grandchild> </child> </root> The <child> element has mixed content. Here is a tree diagram of the <child> element: Which of the following should be considered the string length of the <child> element’s content: (a) The string length is computed using the child text nodes: string-length('0123 ') + string-length(' ') (b) The string length is computed using all descendent text nodes: string-length('0123 ') + string-length('abc') + string-length(' ') (c) The string length is computed using the non-whitespace child text nodes: string-length('0123 ') (d) The string length is computed using the non-whitespace child text nodes, after normalizing space: string-length('0123') (e) The string length is computed using all non-whitespace descendent text nodes, after normalizing space: string-length('0123') + string-length('abc') Conservative Sender
Postel’s principle says that a sender should be conservative in what it sends. What does it mean to be conservative in this case? Conservative means that the XPath should find the longest string and verify that it has no "more than X amount of characters." For the example above, (b) represents the longest string. Thus, the XPath must verify that the string length
of all descendent text nodes of an element does not exceed $x and the string length of each attribute values does not exceed $x. Clearly the root element of any XML document has the greatest number of descendant text nodes, so the XPath can simply check the
root element and all attributes: (string-length(/*) le $x) and (empty(//@*[string-length() gt $x])) Liberal Receiver
Postel’s principle says that a receiver should be liberal in what it receives. What does it mean to be liberal in this case? Liberal means that for each element the XPath should check, after normalizing its child text nodes, the string length does not exceed $x characters. And, of course, the length of each attribute value does not exceed $x characters. Here’s
the XPath: empty(//(*/normalize-space(),
@*)[string-length(.) gt $x]) So if $x is 10, then receivers would accept this XML document: <root> but it would not be acceptable to senders. I am interested in seeing other, actual requirements that are ambiguous and require senders and receivers to wrestle with. /Roger |