OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Seeking Examples of XSLT Memory Stress

[ Lists Home | Date Index | Thread Index ]

On Wed, Aug 17, 2005 at 06:53:41PM +0100, Michael Kay wrote:
>> If the document falls out of scope then both XSLT 1 and 2 allow
>> an implementation to discard it.  I don't think we'll see a
>> procedural way to discard a document otherwise, except as
>> part of something like the XQuery update facility perhaps.

> In practice it's quite difficult to discard the document automatically. The
> spec offers two guarantees:
> (a) if the same document (URI) is loaded again, you'll get the same node
> identifiers
> (b) if the same document (URI) is loaded again, it will have the same
> content
> It would be possible to discard the document and achieve (a) by remembering
> the node identifiers and reusing them if needed. 

> Achieving (b) though is really hard, given that the URI might in the
> worst case identify a random number generator. The only real way to do
> it is to serialize a private copy of the document to disk.
You could also behave differently depending on the URI scheme --
an extension to say "trust http expiry times and that the stylesheet
will take no more than 3 hours to run :-) and trust that input files
won't change on disk" might be interesting.

> The real problem though is in deciding when it's a good idea to discard the
> document. For example, if the stylesheet is working its way through the
> @href links from the primary source document, what's the chance that you'll
> want to visit the same target document more than once?

Are there some special cases that are big wins in prctice?
E.g. consider:
    <xsl:template match="foo">
	<!--* load a 500MByte XML file: *-->
	<xsl:variable name="oed" select="doc('oed.xml')" />
	<!--* do stuff with the dcument *-->
	<xsl:element name="word-of-the-day">
	  <xsl:copy-of select="/dictionary/a/entry[@id = 'ascii'] />

if you don't know how often the template matches I can see that you
might want to cache the whole document in memory, but you have a
couple of other choices --
(1) save the result of the template -- in this case it doesn't depend on
    anything other than the input document, and I've seen this usage
    often, e.g. to get a document title
(2) drop the document if you get low on memory 

This case is very clear, but I don't know at what point it stops
being optimiseable, and I'm sure you've thought about it a lot more
than I have! :-)

> That's why I decided
> that in this case having a user function to tell me when the document is no
> longer needed is rather more useful.

I think it's a good compromise, but I agree with you it'd be hard
to get consensus to add that to XPath F&O.


Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS