[
Lists Home |
Date Index |
Thread Index
]
MIchael:
Thanks for the response. BTW, I use your XSLT book as my primary
reference...nice work!
> You might find it better to ask such questions on the xsl-list at
> mulberrytech.com, or if you're really interested only in Xalan, on a
> Xalan-specific forum.
Like many, I suffer from YAL syndrome. (Yet another list) and am hesitat to
sub to any more lists, given how much stuff I already receive. I knew some
XSLT heavyweights (like yourself) hang here, and hence the decision to post
to the xml-dev group.
I also think that as XML adoption continues to accelerate, transformations of
extremely large documents using XSLT will be more and more a general concern
to the community.
> In general, every mainstream XSLT processor today builds a tree
> representation of the input document in memory. I believe Xalan does parsing
> and transformation in parallel, but it still builds the tree. The fact that
> the parser and the transformer communicate using SAX is irrelevant - it just
> means that the transformer and not the parser is building the tree. (This
> isn't totally irrelevant, because the transformer can build a much more
> efficient tree knowing it is read-only. But it's still an in-memory tree.)
I might have to redesign how we handle our XML in that case, to keep each
mailmerge recipient entry in a separate document, rather than have the whole
thing as one monolithic document.
Do you happen to know if anyone has tried to build an XSLT engine that does
incremental transformations on incoming SAX events, without requiring the
building of a tree? That kind of approach, where the transform is
appropriate, would be much more efficient in memory useage and would allow
transforms of virtually unlimited size documents I should think. Something
to investigate...
> I can't speak for Xalan, but Saxon users are running transformations up to
> 200Mb or so without too much trouble, and at speeds up to 10Mb/sec. It
> requires a little care in configuring the memory allocation, and in writing
> the stylesheet to avoid non-linear constructs, but it's certainly doable.
> Beyond that, it probably gets difficult.
I'm using Xalan (inside Cocoon), and for this task have not yet figured out a
way to use Saxon due to some extensions I'm using. More specifically, I need
to get/put stuff into the session and using something like this (in Xalan):
<xalan:component prefix="javaSession">
<xalan:script lang="javaclass"
src="xalan://org.apache.cocoon.environment.Session"/>
</xalan:component>
Then have templates like:
<xsl:template name="javaCall:setSessionAttribute">
<xsl:param name="attributeName" select="'unknown'" />
<xsl:param name="attributeValue"/>
<xsl:param name="session"/>
<xsl:variable name="dummy"
select="javaSession:setAttribute( $session, $attributeName,
$attributeValue )"/>
</xsl:template>
<xsl:template name="javaCall:getSessionAttribute">
<xsl:param name="attributeName" select="'unknown'" />
<xsl:param name="session"/>
<xsl:copy-of select="javaSession:getAttribute( $session, $attributeName )"/>
</xsl:template>
The session parameter is a reference to the user's session that is passed in
from the calling stylesheet with a bit of magic from a custom Cocoon
transformer class.
This works fine with Xalan, if you save a tree fragment, and then retrieve
it, you end up with a node list/tree fragment as desired. With Saxon,
however, if I instead use the saxon component definition:
<saxon:script language="java"
implements-prefix="javaSession"
src="java:org.apache.cocoon.environment.Session"/>
I can save a result fragment, but when I retrieve it, I don't get a node
list/tree fragment. Haven't figured out how to correct this yet with Saxon.
If it wasn't for this, I could freely change between the two XSLT engines
with a build parameter.
> You don't actually say what you mean
> by a "large document". (Personally, I am amazed to see people handling a 200Mb
> database as a single in-memory document, but perhaps I'm just old-fashioned).
I'm not sure yet...the client has not given me any indication of how big the
mail merge might be. 1M letters would make hit the database limit of 2GB for
the xml document in the table column (clob). 100K letters would hit the
200MB level that you mentioned.
I'ld rather implement a solution that has no limitations, so with the lack of
a true "incremental/SAX" based transformer implementation, I'm thinking that
I'll need to move away from the monolithic document approach and store each
recipient's info in a separate small document to work around the current xslt
document size limitations.
> If you really need purely serial processing, you might consider STX as an
> alternative. However, the existing STX implementations are far less
> widely-used or mature than the popular XSLT implementations.
That's not an option in our case, since we rely on xslt so much.
Andrzej Jan Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com
|