OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XPath/XSLT 2.0 concerns

[ Lists Home | Date Index | Thread Index ]

Hi Robin,

By the way, I really appreciate you giving your implementers-eye-view
on these issues, and that you're not trying to enforce types on
everyone just 'cos you're interested in seeing how they work. Below,
I've argued against what you think *might* be the benefits of
optimisation in terms of there only being a choice between strongly
typed *or* weakly typed XPath, just because it's easier to make my
points against strong typing that way. But I agree with you that
having *both* in a layered manner is the win-win situation.

>> I know that's something that people claim quite a lot, but I don't
>> think that it's at all easy for an implementation to carry out that
>> level of optimisation, and I'm skeptical about whether you would
>> actually get the speed-up you're looking for.
> I did not claim that it's easy :) Likewise, I will not say that I am
> _convinced_ that there will be a measurable speed-up because I don't
> have empirical evidence handy.

Fair enough :) It does seem to me, though, that if a technology is
going to be pulled and pushed and generally contorted and made more
complex by the addition of strong data typing then it ought to be for
a reason that *has* got empirical evidence behind it.

XSLT processors could do the kind of optimisation you're talking about
*right now*, based on DTDs. Do they? If they don't, then why not? If
they *do*, then I think we could learn a lot from the optimisations
that they've been able to put in place. Perhaps Mike Kay can describe
what Saxon does?

> I do believe however that it's a track worth exploring. The use case
> set I'm considering concerns small devices on which even simple XSLT
> is slow and speedups that wouldn't be noticeable on your average
> desktop, but make a big difference in those situations. Given a
> content model in which type B can be contained many times in type A,
> if you have no template matching B, when you see A you can skip
> ahead. Given a sufficient number of Bs, I think the difference may
> be seen.

Well... not if you have *no* template matching B, 'cos if you have
*no* template matching B then the default templates kick in and you
have to process B's children. Perhaps if you have an *empty* template
for B *and* no other templates that might be matched by any of the Bs
(e.g. things like *[C]) then you could skip over A. But working out
whether that's the case or not would probably involve checking each B
against the templates that are available anyway, so I don't think it
would really help.

> Similarly, a "stupid" query like "//*/@foo" (say, in a for-each) can
> probably be optimized away if you know that foo can only appear in
> condition Bar.

Yes; if you know that the document is completely valid, of course. But
if someone has included a "stupid" query, this should be something
detected *at design time*, by lint-style checkers, not optimised away
*at run time*.

> I totally agree that it's a rather restricted use case, even if I
> think it may be proven to be an existing one. Its restricted scope
> is part of what makes me think that it should be optional, even
> though I will be using it and even though that means I'll probably
> be one of the poor fellows that have to implement it ;)

Poor you :)

> I also agree that it might be a much better option to optimize the
> stylesheet based on the schema. Chances are I'll be comparing both
> approaches.


>> I'd argue that in a well-designed stylesheet (one that didn't apply
>> templates to or otherwise visit the nodes in the subtrees you want
>> to ignore), the optimisation won't gain you much, if anything.
> Yes but those are rare.

Well-designed stylesheets are *rare*? :( Well, perhaps (sigh, why did
I bother writing those books and answering those emails?), but I don't
see how optimisation will make them any more common. If anything,
because it hides the infelicities of design, it would make them *more*
common. Stylesheets could be packed with //*/@foo tests that have no
discernable effect on the result because they're "optimised away".

> What I'd argue is that it would be very easy to create a stylesheet
> that will defeat any optimisation.

Definitely. In fact, I would imagine that most stylesheet authors will
do this without even knowing that they're doing it.

Which leads me to the question, again *why* contort XPath around a
desire for optimisation which (a) hasn't been proven to work in any
cases, (b) probably won't work in most cases, (c) leads to programmers
making more bad design choices rather than less?

> Note that while I used an XSLT example, I am also thinking of
> generic XPath requests, where one has a reverse approach to
> apply-templates based processing.

Right -- I assumed that you were talking about *selecting* nodes
rather than *matching* them. Designing optimisers to help *match*
nodes strikes me as similar to designing optimisers to help match
strings against regular expressions. I'm not sure whether that's *at
all* feasible. For selecting nodes, I can see some possible (though
unproven and easily (and unconsciously) subvertable) benefits.

Gosh, I *did* use a lot of emphasis in that message!



Jeni Tennison


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS