XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Rick Jelliffe's article on XSLT 1.0 performance

> Basically, there are some workloads where compile-time matters (because you're compiling the stylesheet once per transformation),
>
I would not have responded at times where you needed to pay for a IBM DataPower Gateway to do some benchmarks.
But since last year DataPower docker version is available for free (for development purposes only).

DataPower does provide XSLT 1.0 (with EXSLT and many proprietary extension functions), XQuery 1.0 (with JSONiq) and GatewayScript (nodejs flavour) compilers.
For XSLT DataPower does compile to machine code and stores it in so called Stylesheet cache. So compile time has to be payed for only once when stylesheet (re-)compilation is needed (recompilation eg. for changed stylesheet, or on flushed stylesheet cache). All following transformations do run the binary code from stylesheet cache without any compilation overhead.

I don't run Windows, so cannot do the Powershell test, but you can if you want.


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Compiler Level 3 support & Fixpack team lead
IBM DataPower Gateways (⬚ᵈᵃᵗᵃ / ⣏⠆⡮⡆⢹⠁⡮⡆⡯⠂⢎⠆⡧⡇⣟⡃⡿⡃)
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/HermannSW/ https://stamm-wilbrandt.de/GraphvizFiddle/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Inactive hide details for Michael Kay ---12.02.2017 10:29:48--->  > I agree that  benchmarking the usual expected cases is justMichael Kay ---12.02.2017 10:29:48---> > I agree that benchmarking the usual expected cases is just as important as benchmarking edge c

From: Michael Kay <mike@saxonica.com>
To: Rick Jelliffe <rjelliffe@allette.com.au>
Cc: XML Developers List <xml-dev@lists.xml.org>
Date: 12.02.2017 10:29
Subject: Re: [xml-dev] Rick Jelliffe's article on XSLT 1.0 performance





>
> I agree that  benchmarking the usual expected cases is just as important as benchmarking edge cases prone to blowouts.  But I am not so sure why you think we cannot add the numbers up

Basically, there are some workloads where compile-time matters (because you're compiling the stylesheet once per transformation), and there are others where it doesn't (the typical case being where you're doing server-side transformation and the cost of compiling the stylesheet is amortized over thousands of transformations). Historically, that's the workload we have optimized for (and we're now realising that might have been a mistake). To provide figures that can be extrapolated to those very different kinds of workload, you really need more than one number.

> I think it shows that for large documents, the cost of XML parsing is utterly dwarfed by the costs of the in-memory data structures and algorithms used for processing.

That's not what I'm seeing. For a minimal parse of a 119Mb input file (Apache Xerces, SAX parse to a do-nothing ContentHandler), run 10 times to assess VM warmup time, I'm seeing the timings:

Time: 735ms
Time: 555ms
Time: 494ms
Time: 517ms
Time: 486ms
Time: 503ms
Time: 508ms
Time: 497ms
Time: 504ms
Time: 486ms
Average of last 8: 499ms

(In future, I'll just give the "average of last 8")

If I now do a JAXP identity transform on the same input, from SAXSource to SAXResult (no XSLT involved, but using Saxon-HE as the identity transformer), so I now have parsing cost plus serialization cost (the serialized output is in memory and is discarded), the timings become

Average of last 8: 1155ms

suggesting that the serialization cost is about 650ms.

Now let's measure parsing plus tree-building, by doing Saxon's configuration.buildDocument() on the same source. This gives timings:

Average of last 8: 989ms

So the picture we are getting is: parsing 500ms, tree-building 500ms, serialization 650ms (total 1650ms)

If I now substitute the JAXP identity transformer with a stylesheet identity.xsl containing the classic identity transformation rule, and with stylesheet compilation out of the measurement loop, I'm seeing transformation times of:

Average of last 8: 2514ms

Which suggests: parsing 500ms, tree-building 500ms, transformation 850ms, serialization 650ms

That is, the actual transformation cost is about one-third of the total.

If we replace the identity transform by a minimal transformation that just does <xsl:template match="/"/>, I get timings:

Average of last 8: 953ms

which is essentially the same as the parsing plus tree building time, so the transformation adds no cost.

If instead I make the root template compute count(//*), the cost increases only to 1027ms, so it is still hardly more than the parsing and tree-building cost.

I think that for many simple "recursive descent" stylesheets that process each input node once and don't do a great deal of computation, the costs are not very different from these. Certainly, the parsing cost is not "dwarfed".

The compilation time for the identity stylesheet is being reported as 3ms, which is negligible for this scenario (large source document, tiny stylesheet) - but when you switch to a scenario with a 10Kb document transformed through the Docbook stylesheets, it's the compilation cost that dominates entirely.

>
> If Saxon is relatively weak at compilation time, why did you drop the pre-compiled stylesheet capability?

Two reasons: (a) it didn't work reliably enough, and (b) hardly anyone used it. In fact it's replaced in current releases by the stylesheet export capability. But one of the lessons we have learned is that people won't sit down and spend a couple of hours working out how to improve the performance of their workload unless things are really chronically bad. Moreover, when they compare products against each other, they won't usually adjust the way they run their tests so that each product performs at its best.

Michael Kay
Saxonica
_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address:
http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive:
http://lists.xml.org/archives/xml-dev/
List Guidelines:
http://www.oasis-open.org/maillists/guidelines.php






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS