xml-dev - RE: transformations

RE: transformations

[ Lists Home | Date Index | Thread Index ]

From: Didier PH Martin <martind@netfolder.com>
To: "Simon St.Laurent" <simonstl@simonstl.com>,XML-Dev Mailing list <xml-dev@xml.org>
Date: Mon, 20 Nov 2000 10:43:47 -0500

Hi Simon,

[post about transformation....]

Didier replies:
The problem is not so much with the XSLT language than it is with the
implementations. The only real limitation this language has is that there is
no standard way to access legacy data from within a transformation sheet.
Instead a two tiers process is required: a) XML document production b) XML
document/fragment aggregation and off course the transformation itself. Here
are the current implementations limitations I noticed from experiments
performed in Talva's labs.

a) Most implementations written in Java do not scale well on SMP machines.
This seems to be caused by the fact that most VM have all the threads to
share the same heap. This implies that a single resource is shared by
multiple processors and this prevent to get quasi-linear scalability. A
better implementation for heap intensive processing would be to have a heap
per thread. In this last case, there's no lock problems and this is
translated into processing improvement (more transformation per second). It
is a lot cheaper when your servers are hosted to have SMP machines than
single CPU machines. So with quad SMPs and with proper memory/heap
management you may need four times less space that with lousy
implementations.

b) Most implementations do not provide any mechanism to allow aggregation to
occur in parallel. The problem is that if I want to integrate several web
service's content (let's say 10) the inclusion/aggregation process occurs in
sequence. A better implementation would be to provide a parallel aggregation
mechanism. Michael Key said that this could be done within the actual
specification requirements. However, nothing appeared under the sun that
provides this kind of feature.

c) Nobody said that XSLT transformation sheet should always be in text
format. Last June, a presentation was made about an infoset VM. The
implication is that this VM runs with op codes. It is possible to create
optimized code like: 1) the optimizer may extract from a source XML file
only the nodelist required for processing not the entire infoset, 2) check
for any operation that can be performed in parallel, 3) manage well the
heaps and the threads, 4) manage well the virtual memory so that an info set
is localized in the same memory page cluster, etc....

d)other improvements that I am not thinking about...

In conclusion: We have - in our hands - first generation XSLT processors.
Most of them are implemented in a language that do not allow to produce high
performance processors (but was great to build quick and dirty
implementations and proof of concepts). No current engines have more
sophisticated optimization mechanism implemented. No current engines seems
to manage well the info set memory allocation. In one word, most of these
engines are great prototypes and now we need to see/get/work with the real
stuff.

Cheers
Didier PH Martin
----------------------------------------------
Email: martind@netfolder.com
Conferences: xml devcon 2000 (http://www.xmldevcon2000.com)
		 Wireless Summit NY (http:www.pulver.com)
	       xml devcon 2001 London (http://www.xmldevcon2000.com)
Book: XML Professional (http://www.wrox.com)
column: Style Matters (http://www.xml.com)

References:
- transformations
  - From: "Simon St.Laurent" <simonstl@simonstl.com>

Prev by Date: Building XML content automatically
Next by Date: Putting default values inside elements
Previous by thread: Re: transformations
Next by thread: Re: transformations
Index(es):
- Date
- Thread