[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] Schematron Best Practice: A Schematron schema's area ofresponsibility?
- From: noah_mendelsohn@us.ibm.com
- To: "Mark Delaney" <MARKD@microen.com>
- Date: Tue, 17 Jul 2007 19:15:50 -0400
Mark Delaney asks:
> Are there order-of-magnitude variations in efficiency, in either memory
> use or time, between alternative languages? If so, are these variations
> essential, or merely a quirk of the available implementations?
I think that in practice that's likely to be more of a question about
particular implementations, than the languages in general. Furthermore,
the actual performance you see will be a function of the decisions made in
the particular validator you choose and the particular schema features you
use. My intuition is that if you use relatively simple XPaths, and if you
don't use features like identity contraints aggressively in XSD, then the
theoretically achievable performance for an Schematron schema and an XSD
schema feels to me like it would be on the same order. Then again, I
would expect that if you compare multiple implementations of either one of
these languages, the performance could vary literally by, say, 2 orders of
magnitude according to how much attention was paid to performance
optimization.
As an example of the sorts of factors that can come into play if you want
to optimize really aggressively, I can (self-servingly) point you to a
paper my team published at www2006 on a very high performance XSD
implementation [1]. The message you should get is that with real care,
XSD can run much faster than you usually see it run. In fact, you can do
validations with roughly zero overhead compared to many of the better
nonvalidating parsers you've seen (e.g. Expat), at least in many cases.
Then again, that implementation is very complex, depends on advance
compilation, has never been productized for general use, and is not
typical of what's likely available to you in practice today. Still, it
shows what can be done with the XSD language.
I have less experience with Schematron, but my intuition is that a really
aggressive Schematron implementation could reach similar speeds,
especially with XPaths that stream well, or that can be rewritten to
stream well (Fabio Vitali's team in Bologna claims that's pretty much all
XPaths.) Streaming and good validation performance often go together,
since any non-streaming behavior tends to mean you're revisiting things
you've seen before, and that's overhead. See our paper for more on that
line of thinking.
I also have the intuition that Schematron has often been used in
interactive scenarios, where performance tuning is likely a lower
priority. Certainly running a general purpose XSL transform to create
reports is likely to be slow, unless you have a very, very aggressively
tuned XSL processor.
So, I think it's likely to be the implementations rather than the
languages that you want to look at, unless your use of Schematron turns
out to require very general XSL processing, in which case it could be
significantly slower. That last claim is based completely on intuition,
and Rick is welcome to suggest that I am wildly off base (I suppose he's
welcome to do that on all the claims above.)
Noah
[1] http://www2006.org/programme/item.php?id=5011
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]