OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] XPath 1.5? (was RE: [xml-dev] typing and markup)

[ Lists Home | Date Index | Thread Index ]

At 03:35 PM 5/8/2002 +0100, Jeni Tennison wrote:
>Hi Jonathan,
>
> > Static analysis never costs anything at run time. Static analysis
> > can also be used to rewrite code to be more efficient at run time.
> > So static typing, the typing people seem to be complaining about the
> > most, never degrades performance, and often helps performance.
>
>I think that David's point about the differences between how XQuery
>and XSLT are used is pertinent here. XSLT stylesheets tend to be
>compiled afresh each time they are used, whereas (possibly) XQuery
>queries can be compiled once and operate over the same set of data
>multiple times. I think that the reason people think static typing
>degrades performance is because in XSLT it's usually *part of* run
>time -- compilation and execution are part of the same process.

I am very much speaking from an XQuery perspective.

Does static analysis make sense for XSLT? I haven't spent enough time 
thinking seriously about the question to have an opinion. Still, these are 
choices made by each individual implementation, not by the spec. If an 
optimization takes more time to compute than it saves, the implementation 
should not do that optimization.

>Certainly there are cases where stylesheets are compiled once and used
>multiple times, particularly on servers. However, that's not a
>plausible model with client-side transformation, which is one of the
>growth areas of XSLT.

This makes sense to me.

> > Dynamic typing has a more direct influence on run time performance.
> > For any system that actually stores typed data, including relational
> > databases and those native XML databases that use XML Schema,
> > representing typed data directly in the XML view is probably less
> > overhead at run time. If you are going to use data with typed
> > operations such as arithmetic operations, then an untyped approach
> > means you need a run-time cast, which is not needed if your data is
> > typed. A system that knows the datatypes associated with data can
> > use that information to optimize the way it is stored and used.
>
>There's a distinction, in my mind at least, between optimisation based
>on simple types and optimisation based on complex types, and more
>subtly on optimisation based on built-in types and optimisation based
>on user-defined types.

Most of the standard techniques for type-based optimization are based on 
the most standard types, which are simple types. I do think that there are 
at least some optimizations that can be done basd on knowing the complex 
types - an example I have given previously is when existential 
quantification can be replaced by a simple comparison.

I think that optimization of // is a more compelling way to use knowledge 
of complex types. Suppose you have a pattern like this:

         //address

Without knowledge of the complex types involved, this requires examination 
of all elements in the document to see if they are "address" elements. 
Looking at the schema for a particular invoice document, it is easy to see 
that the above pattern can only match shipping or billing addresses found 
in customers. The optimizer can rewrite the above pattern as follows:

         /customer/billing/address | /customer/shipping/address

In at least some environments, this will be much more efficient to execute. 
Incidentally, the user does not see whether an implementation does this 
rewrite, the user only sees the increase in speed. Implementations should 
feel free to do whatever static optimizations they can, but not required 
to. Vendors will want to make their implementations fast.

The user is only affected when it comes to correctness in the use of 
complex types. Let's seque to query for a second. Suppose we have the same 
schema mentioned above, and the user writes the following function:

define function customer-address(element customer $c)
   returns element address
{
      $c/address
}

Static type checking will report that $c/address evaluates to an empty 
sequence, because the address element is always found in a billing or 
shipping element within customer. Static type checking is optional, but if 
the user asks for it, the system tells the user what is wrong with this query.

If the user did not do static type checking, this would be discovered at 
run time, not during static analysis.

I don't know how XSLT intends to handle the typing of templates like this:

<template match="customer/address">
    . . .
</template>

>I can perfectly well see the argument for optimisation based on
>built-in simple types. If you look at the implementations of XPath,
>Expression objects usually have "evaluateAsBoolean" values, for
>example, which enable you to make shortcuts such as not retrieving an
>entire node set when you only need to check if it contains more than
>one node. That's great, and I wouldn't do without the speed increases
>it brings.
>
>But from what I can see, the same kind of operation on user-defined
>types, particularly on complex types, is going to be a lot harder. I'm
>not an implementer, but I imagine it would take a lot of work during
>compilation to create classes for different kinds of elements so that
>you can take advantage of their particular features, such as testing
>whether an element can have a particular child before trying to
>retrieve it. The reason it's worthwhile doing this for the built-in
>types is precisely because they're built in.

I suspect that rewriting // is likely to be helpful for XSLT, not just for 
XQuery. I suspect there are other useful optimizations - and I also suspect 
that we will be working on the issue of optimizing XQuery and XPath using 
type information for some time. Providing more information, rather than 
less, to query optimizers will help optimization.

>I think it comes down to what advantage it gives you to treat a 'foo'
>element as a 'foo' element rather than a generic element node, and
>whether that advantage is worth the cost of schema analysis and the
>extra time and memory it takes to have fooElementNode objects, bearing
>in mind that the compile time and run time costs have to directly
>offset each other with XSLT.
>
>If it ends up being roughly equal, or if the analysis time is greater
>than the time you save due to optimisation, which is what I suspect,
>then the question is what's the point of the strong typing for complex
>types?

If you suspect this, then your implementation should not do these 
optimizations.

On the other hand, type correctness is required by the spec.

>Especially as there are lots of *disadvantages*, such as the
>added complexity in the processors and in the spec to deal with all
>the different kinds of casting and validating of complex types.

I would like to see more information on the added complexity people 
anticipate in processors. Since static analysis is optional, it does not 
give overhead if omitted. Optimization based on static analysis is also 
optional, and nobody should implement an optimization that is not more 
optimal.

The spec is more complex with types than without types. That's clear.

Jonathan





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS