RE: [xml-dev] Schematron Best Practice: A Schematron schema's area ofre

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

RE: [xml-dev] Schematron Best Practice: A Schematron schema's area ofresponsibility?

From: noah_mendelsohn@us.ibm.com
To: "Mark Delaney" <MARKD@microen.com>
Date: Tue, 17 Jul 2007 19:15:50 -0400

Mark Delaney asks:

> Are there order-of-magnitude variations in efficiency, in either memory
> use or time, between alternative languages? If so, are these variations
> essential, or merely a quirk of the available implementations?

I think that in practice that's likely to be more of a question about 
particular implementations, than the languages in general.  Furthermore, 
the actual performance you see will be a function of the decisions made in 
the particular validator you choose and the particular schema features you 
use.  My intuition is that if you use relatively simple XPaths, and if you 
don't use features like identity contraints aggressively in XSD, then the 
theoretically achievable performance for an Schematron schema and an XSD 
schema feels to me like it would be on the same order.  Then again, I 
would expect that if you compare multiple implementations of either one of 
these languages, the performance could vary literally by, say, 2 orders of 
magnitude according to how much attention was paid to performance 
optimization.

As an example of the sorts of factors that can come into play if you want 
to optimize really aggressively, I can (self-servingly) point you to a 
paper my team published at www2006 on a very high performance XSD 
implementation [1].  The message you should get is that with real care, 
XSD can run much faster than you usually see it run.  In fact, you can do 
validations with roughly zero overhead compared to many of the better 
nonvalidating parsers you've seen (e.g. Expat), at least in many cases. 
Then again, that implementation is very complex, depends on advance 
compilation, has never been productized for general use, and is not 
typical of what's likely available to you in practice today.  Still, it 
shows what can be done with the XSD language.

I have less experience with Schematron, but my intuition is that a really 
aggressive Schematron implementation could reach similar speeds, 
especially with XPaths that stream well, or that can be rewritten to 
stream well (Fabio Vitali's team in Bologna claims that's pretty much all 
XPaths.)  Streaming and good validation performance often go together, 
since any non-streaming behavior tends to mean you're revisiting things 
you've seen before, and that's overhead.   See our paper for more on that 
line of thinking. 

I also have the intuition that Schematron has often been used in 
interactive scenarios, where performance tuning is likely a lower 
priority.  Certainly running a general purpose XSL transform to create 
reports is likely to be slow, unless you have a very, very aggressively 
tuned XSL processor.

So, I think it's likely to be the implementations rather than the 
languages that you want to look at, unless your use of Schematron turns 
out to require very general XSL processing, in which case it could be 
significantly slower.  That last claim is based completely on intuition, 
and Rick is welcome to suggest that I am wildly off base (I suppose he's 
welcome to do that on all the claims above.)

Noah

[1] http://www2006.org/programme/item.php?id=5011

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Follow-Ups:
- RE: [xml-dev] Schematron Best Practice: A Schematron schema's area ofresponsibility?
  - From: "Rick Jelliffe" <rjelliffe@allette.com.au>

References:
- RE: [xml-dev] Schematron Best Practice: A Schematron schema's area of responsibility?
  - From: "Mark Delaney" <MARKD@microen.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]