[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [xml-dev] DOM or SAX: Sense and Sensibility
On Tue, 13 Nov 2001, PaulT wrote:
>
> From: "Benjamin Franz" <snowhare@nihongo.org>
> >
> > Using Saxon's native 'DocumentInfo' tree in memory - an at least 10X
> > _plus_ performance boost on large documents vs starting from scratch with
> > the serialized XML document. I think that _is_ huge.
>
> I also think that 10X _is_ huge, but I think there is some misunderstading.
> Let me elaborate.
Ok. :)
> > 'Live-parsed' vs 'statically cached-OM' then.
>
> So, we have 'document.xml' and tranfromation 'transform.xsl'
>
> 1. We load document.xml from XML, we apply the
> transformation 'transfrom.xsl'
>
> total time is t1.
> - t1.1 is time of parsing XML into OM,
> - ttrans is time of transformation
>
> t1.1 + ttrans = t1
>
> 2. We load document.xml from binary OM, we apply
> the transformation 'transfrom.xsl'
>
> total time is t2.
> - t2.1 is time of loading XML from OM,
> - ttrans is time of transformation
>
> t2.1 + ttrans = t2
>
> It could be that t1.1 * 10 = t2.1
[...]
> the difference between t1 and t2 is 10-15%
Probably true when loading from a disk.
> I have an impression, that you're comparing t2.1 vs t1.1.
Not exactly. I have enough memory that I keep the OM _in memory_ nearly
all the time. Thus my comparision is effectively between t1 and ttrans
since I don't spend any real time doing t2.1 (it is amortized across many
renders and thus t2.1 approaches zero on a per render basis
asymptotically).
> >From my point of view that's part of the transfromation
> is usually not a problem. ttrans is usually a 'real'
> showstopper ( and I should stress out that ttrans
> is also sensitive to chaching of the stylesheet
> ( to avoid recompilation of the stylesheet ) )
Yes - I cache the stylesheet as well.
> When saying 'typical transfrom' I'm talking about files of 10-50K
Ah. I'm handling 10M documents in a typical transform in this case. That
makes a significant difference in where I spend my time. The raw
transform currently takes less than 1 second normally (roughly 11 seconds
if the OM and stylesheet caches have not yet been filled). I also use an
intelligent second level results cache to improve on that by about another
factor of 50 for the typical case before delivery to a web browser.
--
Benjamin Franz
"Code as if whoever maintains your code is a violent
psychopath who knows where you live."
-- Nancy Lebovitz, the button lady