[
Lists Home |
Date Index |
Thread Index
]
At 03:35 PM 5/8/2002 +0100, Jeni Tennison wrote:
>Hi Jonathan,
>
> > Static analysis never costs anything at run time. Static analysis
> > can also be used to rewrite code to be more efficient at run time.
> > So static typing, the typing people seem to be complaining about the
> > most, never degrades performance, and often helps performance.
>
>I think that David's point about the differences between how XQuery
>and XSLT are used is pertinent here. XSLT stylesheets tend to be
>compiled afresh each time they are used, whereas (possibly) XQuery
>queries can be compiled once and operate over the same set of data
>multiple times. I think that the reason people think static typing
>degrades performance is because in XSLT it's usually *part of* run
>time -- compilation and execution are part of the same process.
I am very much speaking from an XQuery perspective.
Does static analysis make sense for XSLT? I haven't spent enough time
thinking seriously about the question to have an opinion. Still, these are
choices made by each individual implementation, not by the spec. If an
optimization takes more time to compute than it saves, the implementation
should not do that optimization.
>Certainly there are cases where stylesheets are compiled once and used
>multiple times, particularly on servers. However, that's not a
>plausible model with client-side transformation, which is one of the
>growth areas of XSLT.
This makes sense to me.
> > Dynamic typing has a more direct influence on run time performance.
> > For any system that actually stores typed data, including relational
> > databases and those native XML databases that use XML Schema,
> > representing typed data directly in the XML view is probably less
> > overhead at run time. If you are going to use data with typed
> > operations such as arithmetic operations, then an untyped approach
> > means you need a run-time cast, which is not needed if your data is
> > typed. A system that knows the datatypes associated with data can
> > use that information to optimize the way it is stored and used.
>
>There's a distinction, in my mind at least, between optimisation based
>on simple types and optimisation based on complex types, and more
>subtly on optimisation based on built-in types and optimisation based
>on user-defined types.
Most of the standard techniques for type-based optimization are based on
the most standard types, which are simple types. I do think that there are
at least some optimizations that can be done basd on knowing the complex
types - an example I have given previously is when existential
quantification can be replaced by a simple comparison.
I think that optimization of // is a more compelling way to use knowledge
of complex types. Suppose you have a pattern like this:
//address
Without knowledge of the complex types involved, this requires examination
of all elements in the document to see if they are "address" elements.
Looking at the schema for a particular invoice document, it is easy to see
that the above pattern can only match shipping or billing addresses found
in customers. The optimizer can rewrite the above pattern as follows:
/customer/billing/address | /customer/shipping/address
In at least some environments, this will be much more efficient to execute.
Incidentally, the user does not see whether an implementation does this
rewrite, the user only sees the increase in speed. Implementations should
feel free to do whatever static optimizations they can, but not required
to. Vendors will want to make their implementations fast.
The user is only affected when it comes to correctness in the use of
complex types. Let's seque to query for a second. Suppose we have the same
schema mentioned above, and the user writes the following function:
define function customer-address(element customer $c)
returns element address
{
$c/address
}
Static type checking will report that $c/address evaluates to an empty
sequence, because the address element is always found in a billing or
shipping element within customer. Static type checking is optional, but if
the user asks for it, the system tells the user what is wrong with this query.
If the user did not do static type checking, this would be discovered at
run time, not during static analysis.
I don't know how XSLT intends to handle the typing of templates like this:
<template match="customer/address">
. . .
</template>
>I can perfectly well see the argument for optimisation based on
>built-in simple types. If you look at the implementations of XPath,
>Expression objects usually have "evaluateAsBoolean" values, for
>example, which enable you to make shortcuts such as not retrieving an
>entire node set when you only need to check if it contains more than
>one node. That's great, and I wouldn't do without the speed increases
>it brings.
>
>But from what I can see, the same kind of operation on user-defined
>types, particularly on complex types, is going to be a lot harder. I'm
>not an implementer, but I imagine it would take a lot of work during
>compilation to create classes for different kinds of elements so that
>you can take advantage of their particular features, such as testing
>whether an element can have a particular child before trying to
>retrieve it. The reason it's worthwhile doing this for the built-in
>types is precisely because they're built in.
I suspect that rewriting // is likely to be helpful for XSLT, not just for
XQuery. I suspect there are other useful optimizations - and I also suspect
that we will be working on the issue of optimizing XQuery and XPath using
type information for some time. Providing more information, rather than
less, to query optimizers will help optimization.
>I think it comes down to what advantage it gives you to treat a 'foo'
>element as a 'foo' element rather than a generic element node, and
>whether that advantage is worth the cost of schema analysis and the
>extra time and memory it takes to have fooElementNode objects, bearing
>in mind that the compile time and run time costs have to directly
>offset each other with XSLT.
>
>If it ends up being roughly equal, or if the analysis time is greater
>than the time you save due to optimisation, which is what I suspect,
>then the question is what's the point of the strong typing for complex
>types?
If you suspect this, then your implementation should not do these
optimizations.
On the other hand, type correctness is required by the spec.
>Especially as there are lots of *disadvantages*, such as the
>added complexity in the processors and in the spec to deal with all
>the different kinds of casting and validating of complex types.
I would like to see more information on the added complexity people
anticipate in processors. Since static analysis is optional, it does not
give overhead if omitted. Optimization based on static analysis is also
optional, and nobody should implement an optimization that is not more
optimal.
The spec is more complex with types than without types. That's clear.
Jonathan
|