xml-dev - RE: [xml-dev] Incremental transformations with Xalan and performance iss

RE: [xml-dev] Incremental transformations with Xalan and performance iss

[ Lists Home | Date Index | Thread Index ]

To: <andrzej@chaeron.com>,<xml-dev@lists.xml.org>
Subject: RE: [xml-dev] Incremental transformations with Xalan and performance issues?
From: "Michael Kay" <michael.h.kay@ntlworld.com>
Date: Sat, 4 Dec 2004 17:19:48 -0000
In-reply-to: <41B0B41F.31301.E81D1EA@localhost>
Thread-index: AcTZknOIt0pMyIy1TweBskI7hsfX6wAkTn4g

You might find it better to ask such questions on the xsl-list at
mulberrytech.com, or if you're really interested only in Xalan, on a
Xalan-specific forum.

In general, every mainstream XSLT processor today builds a tree
representation of the input document in memory. I believe Xalan does parsing
and transformation in parallel, but it still builds the tree. The fact that
the parser and the transformer communicate using SAX is irrelevant - it just
means that the transformer and not the parser is building the tree. (This
isn't totally irrelevant, because the transformer can build a much more
efficient tree knowing it is read-only. But it's still an in-memory tree.)

I can't speak for Xalan, but Saxon users are running transformations up to
200Mb or so without too much trouble, and at speeds up to 10Mb/sec. It
requires a little care in configuring the memory allocation, and in writing
the stylesheet to avoid non-linear constructs, but it's certainly doable.
Beyond that, it probably gets difficult. You don't actually say what you
mean by a "large document". (Personally, I am amazed to see people handling
a 200Mb database as a single in-memory document, but perhaps I'm just
old-fashioned).

If you really need purely serial processing, you might consider STX as an
alternative. However, the existing STX implementations are far less
widely-used or mature than the popular XSLT implementations.

Michael Kay
http://www.saxonica.com/


> -----Original Message-----
> From: Andrzej Jan Taramina [mailto:andrzej@chaeron.com] 
> Sent: 03 December 2004 23:45
> To: xml-dev@lists.xml.org
> Subject: [xml-dev] Incremental transformations with Xalan and 
> performance issues?
> 
> I'm in a situation where I need to parse some large 
> documents, where the 
> first few elements are a preamble with various parameters and 
> the end of the 
> document is a large list of entries.
> 
> Think of a mail merge, where the letter to be sent is defined 
> first in the 
> mail merge xml, followed by numerous recipient entries, 
> something like this:
> 
> <mailmerge>
> 	<letter>
> 		...letter def goes here
> 	<letter>
> 	<recipients>
> 		<recipient>
> 			...recipient data
> 		</recipient>
> 		<recipient>
> 			...recipient data
> 		</recipient>
> 		etc...
> 	</recipients>
> <mailmerge>
> 
> What I was wondering was how Xalan handles the processing of 
> such large 
> documents (say a million recipient entries) when the parser 
> is using SAX?
> 
> More specifically, if I create global variables such as:
> 
> 	<xsl:variable name="letterTemplate" select="/mailmerge/letter"/>
> 
> then later:
> 
> 	<xsl:template match="recipients/recipient>
> 		<!-- process the recipient using $letterTemplate -->
> 	</xsl:template>
> 
> Will the processing be incremental in nature, as SAX events 
> are received by 
> Xalan?  That is, is Xalan smart enough to create the global 
> as soon as it 
> can, followed by processing of each individual recipient as 
> each related SAX 
> event is received?  In that case, having the shared global 
> info early in the 
> document and the large list at the end would probably have beneficial 
> performance implications.
> 
> Or will the whole document have to be instantiated as some 
> sort of internal 
> tree first?
> 
> Hopefully, it's incremental in nature, since otherwise we 
> might blow out 
> memory with such large documents.
> 
> Any insight into the implications of processing such large 
> documents, using 
> globals, xslt stylesheet structure, impact of element ordering in the 
> document and the like would be very much appreciated.
> 
> Thanks!
> 
> 
> 
> 
> Andrzej Jan Taramina
> Chaeron Corporation: Enterprise System Solutions
> http://www.chaeron.com
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
>

Follow-Ups:
- RE: Incremental transformations with Xalan and performance issues?
  - From: "Andrzej Jan Taramina" <andrzej@chaeron.com>

References:
- Incremental transformations with Xalan and performance issues?
  - From: "Andrzej Jan Taramina" <andrzej@chaeron.com>

Prev by Date: Re: [xml-dev] Data streams
Next by Date: RE: [xml-dev] Data streams
Previous by thread: Incremental transformations with Xalan and performance issues?
Next by thread: RE: Incremental transformations with Xalan and performance issues?
Index(es):
- Date
- Thread