[
Lists Home |
Date Index |
Thread Index
]
You might find it better to ask such questions on the xsl-list at
mulberrytech.com, or if you're really interested only in Xalan, on a
Xalan-specific forum.
In general, every mainstream XSLT processor today builds a tree
representation of the input document in memory. I believe Xalan does parsing
and transformation in parallel, but it still builds the tree. The fact that
the parser and the transformer communicate using SAX is irrelevant - it just
means that the transformer and not the parser is building the tree. (This
isn't totally irrelevant, because the transformer can build a much more
efficient tree knowing it is read-only. But it's still an in-memory tree.)
I can't speak for Xalan, but Saxon users are running transformations up to
200Mb or so without too much trouble, and at speeds up to 10Mb/sec. It
requires a little care in configuring the memory allocation, and in writing
the stylesheet to avoid non-linear constructs, but it's certainly doable.
Beyond that, it probably gets difficult. You don't actually say what you
mean by a "large document". (Personally, I am amazed to see people handling
a 200Mb database as a single in-memory document, but perhaps I'm just
old-fashioned).
If you really need purely serial processing, you might consider STX as an
alternative. However, the existing STX implementations are far less
widely-used or mature than the popular XSLT implementations.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Andrzej Jan Taramina [mailto:andrzej@chaeron.com]
> Sent: 03 December 2004 23:45
> To: xml-dev@lists.xml.org
> Subject: [xml-dev] Incremental transformations with Xalan and
> performance issues?
>
> I'm in a situation where I need to parse some large
> documents, where the
> first few elements are a preamble with various parameters and
> the end of the
> document is a large list of entries.
>
> Think of a mail merge, where the letter to be sent is defined
> first in the
> mail merge xml, followed by numerous recipient entries,
> something like this:
>
> <mailmerge>
> <letter>
> ...letter def goes here
> <letter>
> <recipients>
> <recipient>
> ...recipient data
> </recipient>
> <recipient>
> ...recipient data
> </recipient>
> etc...
> </recipients>
> <mailmerge>
>
> What I was wondering was how Xalan handles the processing of
> such large
> documents (say a million recipient entries) when the parser
> is using SAX?
>
> More specifically, if I create global variables such as:
>
> <xsl:variable name="letterTemplate" select="/mailmerge/letter"/>
>
> then later:
>
> <xsl:template match="recipients/recipient>
> <!-- process the recipient using $letterTemplate -->
> </xsl:template>
>
> Will the processing be incremental in nature, as SAX events
> are received by
> Xalan? That is, is Xalan smart enough to create the global
> as soon as it
> can, followed by processing of each individual recipient as
> each related SAX
> event is received? In that case, having the shared global
> info early in the
> document and the large list at the end would probably have beneficial
> performance implications.
>
> Or will the whole document have to be instantiated as some
> sort of internal
> tree first?
>
> Hopefully, it's incremental in nature, since otherwise we
> might blow out
> memory with such large documents.
>
> Any insight into the implications of processing such large
> documents, using
> globals, xslt stylesheet structure, impact of element ordering in the
> document and the like would be very much appreciated.
>
> Thanks!
>
>
>
>
> Andrzej Jan Taramina
> Chaeron Corporation: Enterprise System Solutions
> http://www.chaeron.com
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>
|