xml-dev - Re: [xml-dev] XPath/XSLT 2.0 concerns

Re: [xml-dev] XPath/XSLT 2.0 concerns

[ Lists Home | Date Index | Thread Index ]

To: Jeni Tennison <jeni@jenitennison.com>
Subject: Re: [xml-dev] XPath/XSLT 2.0 concerns
From: Robin Berjon <robin.berjon@expway.fr>
Date: Wed, 02 Oct 2002 19:01:27 +0200
Cc: xml-dev@lists.xml.org
Organization: Expway
References: <72RLYT04POLJXV95A7FBMJXRM82TRVT.3d9a6f4c@MChamp> <144498584266.20021002103555@jenitennison.com> <3D9B0101.4020504@textuality.com> <3D9B0CB4.6010207@prescod.net> <3D9B13F8.20608@expway.fr> <12522921521.20021002172133@jenitennison.com>
Reply-to: robin.berjon@expway.fr
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826

Hi Jeni,

Jeni Tennison wrote:
>>I don't believe that either and I'd add that it takes a pretty
>>narrow view on XML but I can in fact see use cases for having access
>>to types in XPath. For instance when I see an XSLT processor chew
>>for several minutes on a very predictable document (granted, it's
>>Java based, but still) I think that if it had access to schema
>>information it could optimize a lot of what it's doing by skipping
>>entire subtrees.
> 
> I know that's something that people claim quite a lot, but I don't
> think that it's at all easy for an implementation to carry out that
> level of optimisation, and I'm skeptical about whether you would
> actually get the speed-up you're looking for.

I did not claim that it's easy :) Likewise, I will not say that I am 
_convinced_ that there will be a measurable speed-up because I don't 
have empirical evidence handy.

I do believe however that it's a track worth exploring. The use case set 
I'm considering concerns small devices on which even simple XSLT is slow 
  and speedups that wouldn't be noticeable on your average desktop, but 
make a big difference in those situations. Given a content model in 
which type B can be contained many times in type A, if you have no 
template matching B, when you see A you can skip ahead. Given a 
sufficient number of Bs, I think the difference may be seen. Similarly, 
a "stupid" query like "//*/@foo" (say, in a for-each) can probably be 
optimized away if you know that foo can only appear in condition Bar.

I totally agree that it's a rather restricted use case, even if I think 
it may be proven to be an existing one. Its restricted scope is part of 
what makes me think that it should be optional, even though I will be 
using it and even though that means I'll probably be one of the poor 
fellows that have to implement it ;)

I also agree that it might be a much better option to optimize the 
stylesheet based on the schema. Chances are I'll be comparing both 
approaches.

> Unless you've got really complicated stylesheets, a large proportion
> of the time spent by an XSLT processor will be on parsing and building
> up the node tree, especially if the document is so large that it has
> to start swapping in order to find enough memory to store it. Having a
> schema available will not help at this level.

I know but that's definitely not the bottleneck in the case I mentioned. 
The document is huge and the stylesheet is moderately complex, but 
there's plenty of spare memory, the input node tree is only built once 
and the output tree is fairly simple. I'm not too concerned about this 
specific problem, it was just an illustration. The solution in this case 
is probably as simple as replacing Xalan with libxslt.

> Then, as with all these kinds of optimisations, there's the question
> of whether the time taken to perform the inferencing required to do
> the optimisation is actually less than the time it's currently taking
> to do the processing.

The chances are high that the optimisation will happen as early as 
possible. Also, in my specific case we already have all the type 
information. In fact, that's pretty much all we have.

> I'd argue that in a well-designed stylesheet
> (one that didn't apply templates to or otherwise visit the nodes in
> the subtrees you want to ignore), the optimisation won't gain you
> much, if anything.

Yes but those are rare. What I'd argue is that it would be very easy to 
create a stylesheet that will defeat any optimisation.

Note that while I used an XSLT example, I am also thinking of generic 
XPath requests, where one has a reverse approach to apply-templates 
based processing.

And in the end I'm only making the case that strongly typed XPath/XSLT 
does have interesting use cases (ie it shouldn't be dropped completely) 
but that those are specific enough that it should happen separately from 
the generally useful core bits.

-- 
Robin Berjon <robin.berjon@expway.fr>
Research Engineer, Expway
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488

Follow-Ups:
- Re: [xml-dev] XPath/XSLT 2.0 concerns
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] XPath/XSLT 2.0 concerns
  - From: Jeni Tennison <jeni@jenitennison.com>

References:
- Re: [xml-dev] XPath/XSLT 2.0 concerns
  - From: Mike Champion <mc@xegesis.org>
- Re: [xml-dev] XPath/XSLT 2.0 concerns
  - From: Jeni Tennison <jeni@jenitennison.com>
- Re: [xml-dev] XPath/XSLT 2.0 concerns
  - From: Tim Bray <tbray@textuality.com>
- Re: [xml-dev] XPath/XSLT 2.0 concerns
  - From: Paul Prescod <paul@prescod.net>
- Re: [xml-dev] XPath/XSLT 2.0 concerns
  - From: Robin Berjon <robin.berjon@expway.fr>
- Re: [xml-dev] XPath/XSLT 2.0 concerns
  - From: Jeni Tennison <jeni@jenitennison.com>

Prev by Date: Re: [xml-dev] Are hyperlinks presentation or content?
Next by Date: Re: [xml-dev] XPath/XSLT 2.0 concerns
Previous by thread: Re: [xml-dev] XPath/XSLT 2.0 concerns
Next by thread: Re: [xml-dev] XPath/XSLT 2.0 concerns
Index(es):
- Date
- Thread