[
Lists Home |
Date Index |
Thread Index
]
At 08:04 AM 1/4/2002 -0800, Dare Obasanjo wrote:
>----- Original Message -----
>From: "Jonathan Robie" <jonathan.robie@softwareag.com>
>To: "Dare Obasanjo" <kpako@yahoo.com>; "Champion, Mike"
><Mike.Champion@SoftwareAG-USA.com>; <www-xml-query-comments@w3.org>;
><xml-dev@lists.xml.org>
>Cc: <suciu@cs.washington.edu>
>Sent: Thursday, January 03, 2002 4:26 PM
>Subject: Re: [xml-dev] The use of XML syntax in XML Query
>
> > 2. Typechecking without type inference can work fine as long as you do not
> > have joins. If joins are involved, typechecking becomes undecidable.
>
>So what happens with XQuery joins ( http://www.w3.org/TR/xquery/#id-joins )
>then? Will there be a caveat in the recommendation that indicates that static
>typechecking can't be done w.r.t joins?
What this means is that we need to use type inference for static
typechecking in XQuery, because without it, typechecking becomes
undecideable. Fortunately, type inference is precisely the approach we take.
> > 3. Type inference is the most promising approach, but it does lead to some
> > false negatives. He gives an example using a content model involving equal
> > cardinality among three different elements in sequence. The schema for this
> > can not be expressed in DTDs or in XML Schema, so it is not clear to me
> > that this is a real limitation, but I just read this, and I need to do some
> > thinking before I would want to draw a strong conclusion based on his
> > examples.
>
>But his example is fairly simple and utilizes none of the really complex
>abilities of XML schemas.
His example involves
<result>
{
for $x in document("doc.xml")//elm return <a>{ $x/text() }</a>
for $x in document("doc.xml")//elm return <b>{ $x/text() }</b>
for $x in document("doc.xml")//elm return <c>{ $x/text() }</c>
}
</result>
In most current systems, the inferred content model of the result element is:
a*, b*, c*
Now suppose the DTD we require for our output is this:
(((a,a)*, (b,b)*, (c,c)*) | ((a,a)*,a, (b,b)*,b, (c,c)*,c))
In this content model, either a,b, and c all occur an even number of times,
or they all occur an odd number of times. Dan points out that the inferred
content model is not a subset of the required content model, so static type
checking would not be able to guarantee the required output. Incidentally,
you don't need as complicated a required type as Dan uses to get this
result, the following would suffice:
((a,a), (b,b), (c,c))
Here's my take on this:
First off, we know that static typechecking is conservative. If a query
fails static typechecking, that does not mean it will not generate a
correct result, it means only that it can not be guaranteed to generate a
correct result. That's one of the reasons that static typechecking is
optional in XQuery. In this case, a human being can read the query and see
that a more precise type could have been inferred, and this more precise
type could have been guaranteed. One of the things we spend time on in the
XQuery Working Group is examining the kinds of type information that might
be computed for an expression, and trying to make it as precise as
possible. There will almost certainly be cases like the one Dan points out
where our static typechecking is less precise than you would prefer. The
example he gives is rather artificial. He says that further work is needed
to see whether false negatives like this will turn out to be a practical
problem - in other words, will they be so frequent that people will ignore
the useful results of static typechecking?
Looking at the way most schemas and DTDs are written, I don't think that
Dan's example is going to be so frequent that it would cause real problems,
and the type inference associated with most expressions makes sense to me.
But the best way to answer Dan's question, IMHO, is to get more
implementation experience with the static type system, and there are a few
more wrinkles that I would like to see fixed in the very near future.
People need to play with real queries with real DTDs and schemas and see
how useful the static type checking is for them in practice. The theorists
have spent a lot of time creating this system, but we have not had much of
a chance to kick the tires. I have spent a little time kicking the tires by
looking at the queries I write and the DTDs or schemas associated with
them, and my initial impression is that static typechecking will be very
useful.
>For instance, how can static typechecking work for
>schemas that use identity constraints like xs:unique? I fail to see how one
>can guarantee that the following expression
>
>f = <results>
>FOR $x in /employee/age
>RETURN
> <employee-id>{
> $x * 13
> }</employee-id>
></results>
>
>will only return unique values for <employee-id> via type inference.
Static typechecking involves structure, not values. Uniqueness constraints
are defined in terms of values. Static typechecking also ignores the facets
of simple types in XML Schema, which can also result in errors not detected
by the static type system. But practical static type systems that also take
values into account are well beyond the state of the art.
> > 4. Dan concludes that type inference is still the most promising approach
> > to static typechecking, but that further work needs to be done on its
> > applicability and limitations.
> >
> > My own take on this is that static type checking using type inference is
> > very promising, and seems to work well in theory for the kinds of queries I
> > have looked at. An implementation using the current type system did catch
> > interesting errors for me. I think our type system needs further work, and
> > we need more practical experience using implementations that do static type
> > checking. This is one of the highest priorities for me personally.
>
>I'm interested in the type of queries you've looked at. I'm not convinced that
>this is as straightforward a problem as you've implied but readily admit that
>my theoretical CS skills are nowhere near excellent so you may be right and
>all I need is a little convincing.
1. Mismatches in content models
Suppose the input DTD uses the following definition of author:
<!ELEMENT author ( #PCDATA)>
The output DTD uses the following definition of author:
<!ELEMENT author (first, last)>
The following query does not validate according to the output DTD, and the
static type system will catch this:
//author
Any number of common errors can result in mismatches in content models -
mistyping the name of an element or forgetting which name is used for it in
the DTD or schema, putting elements in the wrong order - basically, the
same kinds of errors you typically get when validating with a DTD or
schema, except that the static type system can not detect things related to
values such as identity constraints or facets of XML Schema simple types.
2. Depending on optional content
Suppose we have the following employee element in our input DTD:
<!ELEMENT employee (name, salary, percent-bonus?)>
Our output DTD shows the computed bonus for each employee, which may be
zero, but must be present:
<!ELEMENT employee (name, bonus)>
Now consider the following query:
for $e in //employee
return
<employee>
{
$e/name,
$e/salary * $e/percent-bonus
}
</employee>
Oops, that has the wrong structure, because salary*bonus evaluates to a
value, not to an element. Fortunately, as we showed in (1), the static type
system catches this error. Let's try again:
for $e in //employee
return
<employee>
{
$e/name,
<bonus>{ $e/salary * $e/percent-bonus }</bonus>
}
</employee>
Suppose we have a schema, rather than a DTD, and bonus is defined as a
decimal value which may not be null. In XQuery, if percent-bonus evaluates
to an empty sequence, then the product of salary and the empty sequence is
an empty sequence. For any employee that did not have a percent-bonus in
the input, there is no bonus in the output, so this does not pass the
static type check. Let's fix the query using if-empty() to supply a default
value of zero for percent-bonus:
for $e in //employee
return
<employee>
{
$e/name,
<bonus>{ $e/salary * if-empty($e/percent-bonus, 0) }</bonus>
}
</employee>
That now passes static type checking. Incidentally, note that it is quite
possible that we would have missed the second error in our testing, because
our test files might have supplied percent-bonus for all employees. A
customer site, using data corresponding to the same DTD, could have
encountered the error we missed. Static type checking caught the error
without the need for an adequate test to expose this dependency.
3. Forgetting the curly braces
Suppose I forgot the curly braces in the above example:
<bonus> $e/salary * if-empty($e/percent-bonus, 0) </bonus>
This element constructor has string content, which does not match the type
of a bonus element in my schema.
4. Confusing a URL with the resource to which it points
This is an error I made several times while developing the RDF library for
the Syntactic Web paper. Some of my functions returned URLs, others
returned RDF descriptions, and sometimes I changed the return type of a
function without remembering to change all function calls using that
function. Since a URL and a resource have different structure, static type
checking catches this error.
I hope this gives the basic flavor. I think it this is a very useful degree
of error checking, and these are errors that our current static type system
already reports.
Jonathan
|