OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] DOM or SAX: Sense and Sensibility



On Tue, 13 Nov 2001, PaulT wrote:
>
> From: "Benjamin Franz" <snowhare@nihongo.org>
> >
> > Using Saxon's native 'DocumentInfo' tree in memory - an at least 10X
> > _plus_ performance boost on large documents vs starting from scratch with
> > the serialized XML document. I think that _is_ huge.
>
> I also think that 10X _is_ huge, but I think there is some misunderstading.
> Let me elaborate.

Ok. :)

> > 'Live-parsed' vs 'statically cached-OM' then.
>
> So, we have 'document.xml' and tranfromation 'transform.xsl'
>
> 1. We load document.xml from XML, we apply the
> transformation 'transfrom.xsl'
>
> total time is t1.
> - t1.1 is time of parsing XML into OM,
> - ttrans is time of transformation
>
> t1.1 + ttrans = t1
>
> 2. We load document.xml from binary OM, we apply
> the transformation 'transfrom.xsl'
>
> total time is t2.
> - t2.1 is time of loading XML from OM,
> - ttrans is time of transformation
>
> t2.1 + ttrans = t2
>
> It could be that t1.1 * 10 = t2.1

[...]

> the difference between t1 and t2 is 10-15%

Probably true when loading from a disk.

> I have an impression, that you're comparing t2.1 vs t1.1.

Not exactly. I have enough memory that I keep the OM _in memory_ nearly
all the time. Thus my comparision is effectively between t1 and ttrans
since I don't spend any real time doing t2.1 (it is amortized across many
renders and thus t2.1 approaches zero on a per render basis
asymptotically).

> >From my point of view that's part of the transfromation
> is usually not  a problem. ttrans is usually a 'real'
> showstopper ( and I should stress out that ttrans
> is also sensitive to chaching of the stylesheet
> ( to avoid recompilation of the stylesheet ) )

Yes - I cache the stylesheet as well.

> When saying 'typical transfrom' I'm talking about files of 10-50K

Ah. I'm handling 10M documents in a typical transform in this case. That
makes a significant difference in where I spend my time.  The raw
transform currently takes less than 1 second normally (roughly 11 seconds
if the OM and stylesheet caches have not yet been filled). I also use an
intelligent second level results cache to improve on that by about another
factor of 50 for the typical case before delivery to a web browser.

-- 
Benjamin Franz

 "Code as if whoever maintains your code is a violent
  psychopath who knows where you live."
                    -- Nancy Lebovitz, the button lady