OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: Incremental transformations with Xalan and performance issues?

[ Lists Home | Date Index | Thread Index ]

MIchael:

Thanks for the response. BTW, I use your XSLT book as my primary 
reference...nice work!

> You might find it better to ask such questions on the xsl-list at
> mulberrytech.com, or if you're really interested only in Xalan, on a
> Xalan-specific forum.

Like many, I suffer from YAL syndrome.  (Yet another list) and am hesitat to 
sub to any more lists, given how much stuff I already receive.  I knew some 
XSLT heavyweights (like yourself) hang here, and hence the decision to post 
to the xml-dev group.

I also think that as XML adoption continues to accelerate, transformations of 
extremely large documents using XSLT will be more and more a general concern 
to the community.

> In general, every mainstream XSLT processor today builds a tree
> representation of the input document in memory. I believe Xalan does parsing
> and transformation in parallel, but it still builds the tree. The fact that
> the parser and the transformer communicate using SAX is irrelevant - it just
> means that the transformer and not the parser is building the tree. (This
> isn't totally irrelevant, because the transformer can build a much more
> efficient tree knowing it is read-only. But it's still an in-memory tree.)

I might have to redesign how we handle our XML in that case, to keep each 
mailmerge recipient entry in a separate document, rather than have the whole 
thing as one monolithic document.

Do you happen to know if anyone has tried to build an XSLT engine that does 
incremental transformations on incoming SAX events, without requiring the 
building of a tree?  That kind of approach, where the transform is 
appropriate, would be much more efficient in memory useage and would allow 
transforms of virtually unlimited size documents I should think.  Something 
to investigate...

> I can't speak for Xalan, but Saxon users are running transformations up to
> 200Mb or so without too much trouble, and at speeds up to 10Mb/sec. It
> requires a little care in configuring the memory allocation, and in writing
> the stylesheet to avoid non-linear constructs, but it's certainly doable.
> Beyond that, it probably gets difficult. 

I'm using Xalan (inside Cocoon), and for this task have not yet figured out a 
way to use Saxon due to some extensions I'm using.  More specifically, I need 
to get/put stuff into the session and using something like this (in Xalan):

<xalan:component prefix="javaSession">
	<xalan:script lang="javaclass" 	
					src="xalan://org.apache.cocoon.environment.Session"/>
</xalan:component>

Then have templates like:

<xsl:template name="javaCall:setSessionAttribute">
	<xsl:param name="attributeName" select="'unknown'" />
	<xsl:param name="attributeValue"/>
	<xsl:param name="session"/>
		
	<xsl:variable name="dummy" 
		select="javaSession:setAttribute( $session, 	$attributeName, 
$attributeValue )"/>
</xsl:template>
	
<xsl:template name="javaCall:getSessionAttribute">
	<xsl:param name="attributeName" select="'unknown'" />
	<xsl:param name="session"/>
		
	<xsl:copy-of select="javaSession:getAttribute( $session, $attributeName )"/>
</xsl:template>

The session parameter is a reference to the user's session that is passed in 
from the calling stylesheet with a bit of magic from a custom Cocoon 
transformer class.

This works fine with Xalan, if you save a tree fragment, and then retrieve 
it, you end up with a node list/tree fragment as desired.  With Saxon, 
however, if I instead use the saxon component definition:

<saxon:script language="java" 
				implements-prefix="javaSession" 
				src="java:org.apache.cocoon.environment.Session"/>

I can save a result fragment, but when I retrieve it, I don't get a node 
list/tree fragment.  Haven't figured out how to correct this yet with Saxon.

If it wasn't for this, I could freely change between the two XSLT engines 
with a build parameter.

> You don't actually say what you mean
> by a "large document". (Personally, I am amazed to see people handling a 200Mb
> database as a single in-memory document, but perhaps I'm just old-fashioned).

I'm not sure yet...the client has not given me any indication of how big the 
mail merge might be.  1M letters would make hit the database limit of 2GB for 
the xml document in the table column (clob).  100K letters would hit the 
200MB level that you mentioned.

I'ld rather implement a solution that has no limitations, so with the lack of 
a true "incremental/SAX" based transformer implementation,  I'm thinking that 
I'll need to move away from the monolithic document approach and store each 
recipient's info in a separate small document to work around the current xslt 
document size limitations.

> If you really need purely serial processing, you might consider STX as an
> alternative. However, the existing STX implementations are far less
> widely-used or mature than the popular XSLT implementations.

That's not an option in our case, since we rely on xslt so much.


Andrzej Jan Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS