[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
XML Schema union type is evil (for XPath 2.0 processing)
- From: "Costello, Roger L." <costello@mitre.org>
- To: <xml-dev@lists.xml.org>
- Date: Mon, 14 Apr 2008 14:23:32 -0400
Hi Folks,
If you use XML Schema unions, please be aware of their pitfalls, which
are described below.
In Michael Kay's book, XPath 2.0 (p. 259 and 289), he shows 4 cases
where the use of a union type can lead to problems. Here are the 4
cases:
1. Consider this <prices> element which contains a list of prices. If
no price is available then N/A is listed:
<prices>40.99 19.00 N/A 23.80</prices>
Each list value is either a decimal or the string value, "N/A". That
is, each list value is a union of:
- xs:decimal
- a simpleType with enumeration value of "N/A"
Suppose I want to write an XPath expression to see if there are some
prices over 30.00. Here's one way to express it:
if (some $i in data(prices) satisfies $i gt 30.00) then
'Expensive stuff'
else
'Cheap stuff'
I ran this XPath using SAXON and got this output: Expensive stuff.
Then I changed the input by swapping the first list value with the N/A
value:
<prices>N/A 19.00 40.99 23.80</prices>
I ran the same XPath against this input, using the same SAXON processor
and I got an error message saying that I can't compare the string "N/A"
against the decimal 30.00
So, depending on the "order" of the input data I get a successful
result or an error!
Furthermore, even with the first version of the input:
<prices>40.99 19.00 N/A 23.80</prices>
I may, or may not get an error. SAXON evaluates the list values from
left to right, and stops as soon as it finds a true value (40.99 gt
30.00 is true, so it stops). XPath processors are free to evaluate the
list values in any order. So, another XPath processor may evaluate the
list values from right-to-left, and give an error.
Recap:
(a) You may, or may not, get an error depending on the order of the
list values.
(b) You may, or may not, get an error depending on the XPath processor
that you use.
The good news is that there is a way to protect yourself against this
problem:
if (some $i in data(prices)[. instance of xs:decimal] satisfies $i
gt 30.00) then
'Expensive stuff'
else
'Cheap stuff'
The predicate will filter the "N/A" list value, and so there will never
arise the situation where "N/A" is compared against 30.00
2. The same problem arises with the "every" expression, e.g.
if (every $i in data(prices) satisfies $i lt 30.00) then
'Buy at this store'
else
'Shop elsewhere'
With this input:
<prices>40.99 19.00 N/A 23.80</prices>
SAXON gives this output: Shop elsewhere
With this input (swap the first list value with "N/A"):
<prices>N/A 19.00 40.99 23.80</prices>
SAXON generates an error.
Again, it is possible to protect yourself:
if (every $i in data(prices)[. instance of xs:decimal] satisfies $i
lt 30.00) then
'Buy at this store'
else
'Shop elsewhere'
3. Next, consider a <quantity> element whose value is either a number
or the string "out-of-stock". Here are two examples:
<quantity>out-of-stock</quantity>
<quantity>20</quantity>
The value of quantity is either a number or the string value
"out-of-stock". That is, the value is a union of:
- xs:nonNegativeInteger
- a simpleType with enumeration value of "out-of-stock"
Now, suppose I want to write an XPath expression to see if the quantity
is out-of-stock:
if (data(quantity) eq 'out-of-stock') then
'Bummer'
else
'Buy them all!'
With the first example above as input, the output is: Bummer
With the second example as input, an error is generated.
Again, there is a way to protect yourself:
if (data(quantity) instance of xs:string) then
if (data(quantity) eq 'out-of-stock') then
'Bummer'
else
"Something is screwed up in the input"
else
'Buy them all!'
4. Another problem with union types is that you can't use them in the
type declarations of XSLT parameters or variables, for example you can
have an attribute in the schema whose type is union of (xs:date,
xs:gYearMonth, xs:gYear), but you can't declare a variable or parameter
of that type - it has to be either atomic or a node.
SUMMARY
1. Input data that contains union values must be dealt with carefully.
2. If you don't design the XPath to protect yourself, then your XPath
may succeed with some inputs and fail with others; it may succeed with
some XPath processors and fail with others.
3. While it is possible to write XPath expressions to "protect
yourself" it is, I think, likely that people will either:
- forget to do so
- not know how to do so
- not be aware of the problem with union types
/Roger
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]