Lists Home |
Date Index |
From: "Michael Fitzgerald" <firstname.lastname@example.org>
> Fair enough. I don't really smell conspiracy here. I hope that it will not
> be technically or politically impossible for vendors to use RELAX NG
> provided that RELAX NG does the right thing wrt type assignment.
Ah, but there's the rub. Is type assignment indeed the right thing?*
And if/where it is, should it be in used by augmenting the infoset (TAI) or
by inferring types onto a query?
It is supposed to be for efficiency (among other things). But look at
some applications that need efficiency, such as SVG. An optimised
SVG application would presumably store its data in a native data structure that
reflect the actual datatypes used by SVG, rather than necessarily the ones
provided by XQuery.**
Lets say we have XML documents that been validated by schema language(s)
that do not do type assignment or create any type-augmented infoset.
If we look at the uses cases for XQuery , it is hard to find any use cases
could not be expressed by dynamic typing (casts). Such as
[date(end_date) >= date("1999-03-01") and date(end_date) <= date("1999-03-31")]
[end_date >= date("1999-03-01") and end_date <= date("1999-03-31")]
in example 188.8.131.52 Q8
The type-augmented infoset (TAI) is just one approach: it is not efficient enough to
fit well with specialist datatypes, and may be heavyweight for simple
processing of XML, especially since a user can add their own casts, to
make their assumptions explicit as above. It is great if XQuery will have
a conformance level that does not require a TAI but can just use casts.
One spanner in the works is equivalence classes: if a document uses extends a
schema by declaring some equivalent elements, then a Query using the TAI and written
against the base schema may still work. (Of course, this is nothing that cannot be done
using architectural forms processing too, more cumbersomely.) I am hard-pressed
to think of another place where a TAI is actually required (I would not be surprised
if there are others), but it are equivalence classes really so wonderful that they
justify unambiguous type assignment in other schema languages? (And if the
other schema language does not support an equivalent of substitution classes,
they provide no justification at all!)
If we look at RELAX NG, we see that it allows ambiguity. But is this enough
to prevent validation of queries, or even infering the types in a query? I don't
really think so, because query code can be made to sort out the ambiguity.
It just makes the query a little more complicated, but someone who makes
an ambiguous content model where this kind of problem might arise would
be asking for trouble.***
Another objection is that users can derive their own datatypes. But, obviously
for a valid document a query can be made in terms of the built-in base
type. So I don't see why (except for fallback) user-derived datatypes
offer any (non-performance-related) reason why a TAI must be used.
So is type assignment the right thing? No, it is *a* right thing which allows
some kinds of efficiency, integrity-checking and tool-integration. But not
having to schema-process the document nor require a TAI may be more
efficient for other uses.
> Even if XQuery (and XSLT2/XPath2) accept PSVI constructed by means other
> than W3C Schema as input documents, the drafts currently have strong
> bias towards W3C Schema in that the validate expression requires (at my
> reading at least) a W3C schema and similarly the types statically known
> to the query are specified by schema import which again has to be
> W3C schema, I think.
So a profile of XQuery that used type inferencing or casts rather than
a TAI should not provide data() and validate(). Or, better, allow other
schema languages. For example, does their validate() allow embedded
validation languages such as Schematron? If so, is there any
real difference between allowing a Schematron schema directly
and allowing it embedded in a dummy XML Schema carrier that allows
any types anywhere?
* Please no tedious flames quoting Richard Gabriel
** Please no tedious flames if this is wrong in some case: the key word
*** I may be getting over my head here: I am thinking of a RELAX NG
grammar which allows ( E?, E?) where the first E has an attribute A
allowing integers "1"|"2"|"3" and the second E has an attribute A
that allows token values "1"|"X"|"Y" and the instance just has <E A="1"/>.
In that case, the query has to decide the policy to use. No big fuss, since
it is silly thing to do, and seems hardly a showstopper against RELAX NG.