Lists Home |
Date Index |
Jeni Tennison wrote:
>>I don't believe that either and I'd add that it takes a pretty
>>narrow view on XML but I can in fact see use cases for having access
>>to types in XPath. For instance when I see an XSLT processor chew
>>for several minutes on a very predictable document (granted, it's
>>Java based, but still) I think that if it had access to schema
>>information it could optimize a lot of what it's doing by skipping
> I know that's something that people claim quite a lot, but I don't
> think that it's at all easy for an implementation to carry out that
> level of optimisation, and I'm skeptical about whether you would
> actually get the speed-up you're looking for.
I did not claim that it's easy :) Likewise, I will not say that I am
_convinced_ that there will be a measurable speed-up because I don't
have empirical evidence handy.
I do believe however that it's a track worth exploring. The use case set
I'm considering concerns small devices on which even simple XSLT is slow
and speedups that wouldn't be noticeable on your average desktop, but
make a big difference in those situations. Given a content model in
which type B can be contained many times in type A, if you have no
template matching B, when you see A you can skip ahead. Given a
sufficient number of Bs, I think the difference may be seen. Similarly,
a "stupid" query like "//*/@foo" (say, in a for-each) can probably be
optimized away if you know that foo can only appear in condition Bar.
I totally agree that it's a rather restricted use case, even if I think
it may be proven to be an existing one. Its restricted scope is part of
what makes me think that it should be optional, even though I will be
using it and even though that means I'll probably be one of the poor
fellows that have to implement it ;)
I also agree that it might be a much better option to optimize the
stylesheet based on the schema. Chances are I'll be comparing both
> Unless you've got really complicated stylesheets, a large proportion
> of the time spent by an XSLT processor will be on parsing and building
> up the node tree, especially if the document is so large that it has
> to start swapping in order to find enough memory to store it. Having a
> schema available will not help at this level.
I know but that's definitely not the bottleneck in the case I mentioned.
The document is huge and the stylesheet is moderately complex, but
there's plenty of spare memory, the input node tree is only built once
and the output tree is fairly simple. I'm not too concerned about this
specific problem, it was just an illustration. The solution in this case
is probably as simple as replacing Xalan with libxslt.
> Then, as with all these kinds of optimisations, there's the question
> of whether the time taken to perform the inferencing required to do
> the optimisation is actually less than the time it's currently taking
> to do the processing.
The chances are high that the optimisation will happen as early as
possible. Also, in my specific case we already have all the type
information. In fact, that's pretty much all we have.
> I'd argue that in a well-designed stylesheet
> (one that didn't apply templates to or otherwise visit the nodes in
> the subtrees you want to ignore), the optimisation won't gain you
> much, if anything.
Yes but those are rare. What I'd argue is that it would be very easy to
create a stylesheet that will defeat any optimisation.
Note that while I used an XSLT example, I am also thinking of generic
XPath requests, where one has a reverse approach to apply-templates
And in the end I'm only making the case that strongly typed XPath/XSLT
does have interesting use cases (ie it shouldn't be dropped completely)
but that those are specific enough that it should happen separately from
the generally useful core bits.
Robin Berjon <firstname.lastname@example.org>
Research Engineer, Expway
7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488