OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Incremental transformations with Xalan and performance iss

[ Lists Home | Date Index | Thread Index ]

You might find it better to ask such questions on the xsl-list at
mulberrytech.com, or if you're really interested only in Xalan, on a
Xalan-specific forum.

In general, every mainstream XSLT processor today builds a tree
representation of the input document in memory. I believe Xalan does parsing
and transformation in parallel, but it still builds the tree. The fact that
the parser and the transformer communicate using SAX is irrelevant - it just
means that the transformer and not the parser is building the tree. (This
isn't totally irrelevant, because the transformer can build a much more
efficient tree knowing it is read-only. But it's still an in-memory tree.)

I can't speak for Xalan, but Saxon users are running transformations up to
200Mb or so without too much trouble, and at speeds up to 10Mb/sec. It
requires a little care in configuring the memory allocation, and in writing
the stylesheet to avoid non-linear constructs, but it's certainly doable.
Beyond that, it probably gets difficult. You don't actually say what you
mean by a "large document". (Personally, I am amazed to see people handling
a 200Mb database as a single in-memory document, but perhaps I'm just
old-fashioned).

If you really need purely serial processing, you might consider STX as an
alternative. However, the existing STX implementations are far less
widely-used or mature than the popular XSLT implementations.

Michael Kay
http://www.saxonica.com/


> -----Original Message-----
> From: Andrzej Jan Taramina [mailto:andrzej@chaeron.com] 
> Sent: 03 December 2004 23:45
> To: xml-dev@lists.xml.org
> Subject: [xml-dev] Incremental transformations with Xalan and 
> performance issues?
> 
> I'm in a situation where I need to parse some large 
> documents, where the 
> first few elements are a preamble with various parameters and 
> the end of the 
> document is a large list of entries.
> 
> Think of a mail merge, where the letter to be sent is defined 
> first in the 
> mail merge xml, followed by numerous recipient entries, 
> something like this:
> 
> <mailmerge>
> 	<letter>
> 		...letter def goes here
> 	<letter>
> 	<recipients>
> 		<recipient>
> 			...recipient data
> 		</recipient>
> 		<recipient>
> 			...recipient data
> 		</recipient>
> 		etc...
> 	</recipients>
> <mailmerge>
> 
> What I was wondering was how Xalan handles the processing of 
> such large 
> documents (say a million recipient entries) when the parser 
> is using SAX?
> 
> More specifically, if I create global variables such as:
> 
> 	<xsl:variable name="letterTemplate" select="/mailmerge/letter"/>
> 
> then later:
> 
> 	<xsl:template match="recipients/recipient>
> 		<!-- process the recipient using $letterTemplate -->
> 	</xsl:template>
> 
> Will the processing be incremental in nature, as SAX events 
> are received by 
> Xalan?  That is, is Xalan smart enough to create the global 
> as soon as it 
> can, followed by processing of each individual recipient as 
> each related SAX 
> event is received?  In that case, having the shared global 
> info early in the 
> document and the large list at the end would probably have beneficial 
> performance implications.
> 
> Or will the whole document have to be instantiated as some 
> sort of internal 
> tree first?
> 
> Hopefully, it's incremental in nature, since otherwise we 
> might blow out 
> memory with such large documents.
> 
> Any insight into the implications of processing such large 
> documents, using 
> globals, xslt stylesheet structure, impact of element ordering in the 
> document and the like would be very much appreciated.
> 
> Thanks!
> 
> 
> 
> 
> Andrzej Jan Taramina
> Chaeron Corporation: Enterprise System Solutions
> http://www.chaeron.com
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
> 





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS