Re: [xml-dev] Rick Jelliffe's article on XSLT 1.0 performance

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Hermann Stamm-Wilbrandt" <STAMMW@de.ibm.com>
To: Michael Kay <mike@saxonica.com>
Date: Wed, 15 Feb 2017 14:26:20 +0100

> Basically, there are some workloads where compile-time matters (because you're compiling the stylesheet once per transformation),
>
I would not have responded at times where you needed to pay for a IBM DataPower Gateway to do some benchmarks.
But since last year DataPower docker version is available for free (for development purposes only).

DataPower does provide XSLT 1.0 (with EXSLT and many proprietary extension functions), XQuery 1.0 (with JSONiq) and GatewayScript (nodejs flavour) compilers.
For XSLT DataPower does compile to machine code and stores it in so called Stylesheet cache. So compile time has to be payed for only once when stylesheet (re-)compilation is needed (recompilation eg. for changed stylesheet, or on flushed stylesheet cache). All following transformations do run the binary code from stylesheet cache without any compilation overhead.

I don't run Windows, so cannot do the Powershell test, but you can if you want.

Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Compiler Level 3 support & Fixpack team lead
IBM DataPower Gateways (⬚ᵈᵃᵗᵃ / ⣏⠆⡮⡆⢹⠁⡮⡆⡯⠂⢎⠆⡧⡇⣟⡃⡿⡃)
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/HermannSW/ https://stamm-wilbrandt.de/GraphvizFiddle/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Michael Kay ---12.02.2017 10:29:48---> > I agree that benchmarking the usual expected cases is just as important as benchmarking edge c

From: Michael Kay <mike@saxonica.com>
To: Rick Jelliffe <rjelliffe@allette.com.au>
Cc: XML Developers List <xml-dev@lists.xml.org>
Date: 12.02.2017 10:29
Subject: Re: [xml-dev] Rick Jelliffe's article on XSLT 1.0 performance

> > I agree that benchmarking the usual expected cases is just as important as benchmarking edge cases prone to blowouts. But I am not so sure why you think we cannot add the numbers up Basically, there are some workloads where compile-time matters (because you're compiling the stylesheet once per transformation), and there are others where it doesn't (the typical case being where you're doing server-side transformation and the cost of compiling the stylesheet is amortized over thousands of transformations). Historically, that's the workload we have optimized for (and we're now realising that might have been a mistake). To provide figures that can be extrapolated to those very different kinds of workload, you really need more than one number. > I think it shows that for large documents, the cost of XML parsing is utterly dwarfed by the costs of the in-memory data structures and algorithms used for processing. That's not what I'm seeing. For a minimal parse of a 119Mb input file (Apache Xerces, SAX parse to a do-nothing ContentHandler), run 10 times to assess VM warmup time, I'm seeing the timings: Time: 735ms Time: 555ms Time: 494ms Time: 517ms Time: 486ms Time: 503ms Time: 508ms Time: 497ms Time: 504ms Time: 486ms Average of last 8: 499ms (In future, I'll just give the "average of last 8") If I now do a JAXP identity transform on the same input, from SAXSource to SAXResult (no XSLT involved, but using Saxon-HE as the identity transformer), so I now have parsing cost plus serialization cost (the serialized output is in memory and is discarded), the timings become Average of last 8: 1155ms suggesting that the serialization cost is about 650ms. Now let's measure parsing plus tree-building, by doing Saxon's configuration.buildDocument() on the same source. This gives timings: Average of last 8: 989ms So the picture we are getting is: parsing 500ms, tree-building 500ms, serialization 650ms (total 1650ms) If I now substitute the JAXP identity transformer with a stylesheet identity.xsl containing the classic identity transformation rule, and with stylesheet compilation out of the measurement loop, I'm seeing transformation times of: Average of last 8: 2514ms Which suggests: parsing 500ms, tree-building 500ms, transformation 850ms, serialization 650ms That is, the actual transformation cost is about one-third of the total. If we replace the identity transform by a minimal transformation that just does <xsl:template match="/"/>, I get timings: Average of last 8: 953ms which is essentially the same as the parsing plus tree building time, so the transformation adds no cost. If instead I make the root template compute count(//*), the cost increases only to 1027ms, so it is still hardly more than the parsing and tree-building cost. I think that for many simple "recursive descent" stylesheets that process each input node once and don't do a great deal of computation, the costs are not very different from these. Certainly, the parsing cost is not "dwarfed". The compilation time for the identity stylesheet is being reported as 3ms, which is negligible for this scenario (large source document, tiny stylesheet) - but when you switch to a scenario with a 10Kb document transformed through the Docbook stylesheets, it's the compilation cost that dominates entirely. > > If Saxon is relatively weak at compilation time, why did you drop the pre-compiled stylesheet capability? Two reasons: (a) it didn't work reliably enough, and (b) hardly anyone used it. In fact it's replaced in current releases by the stylesheet export capability. But one of the lessons we have learned is that people won't sit down and spend a couple of hours working out how to improve the performance of their workload unless things are really chronically bad. Moreover, when they compare products against each other, they won't usually adjust the way they run their tests so that each product performs at its best. Michael Kay Saxonica _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address:http://www.oasis-open.org/mlmanage/Or unsubscribe: xml-dev-unsubscribe@lists.xml.org subscribe: xml-dev-subscribe@lists.xml.org List archive:http://lists.xml.org/archives/xml-dev/List Guidelines:http://www.oasis-open.org/maillists/guidelines.php

Follow-Ups:
- Re: [xml-dev] Rick Jelliffe's article on XSLT 1.0 performance
  - From: Rick Jelliffe <rjelliffe@allette.com.au>

References:
- Rick Jelliffe's article on XSLT 1.0 performance
  - From: Michael Kay <mike@saxonica.com>
- Re: [xml-dev] Rick Jelliffe's article on XSLT 1.0 performance
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- Re: [xml-dev] Rick Jelliffe's article on XSLT 1.0 performance
  - From: Michael Kay <mike@saxonica.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]