[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] Fast validating XML parser
- From: noah_mendelsohn@us.ibm.com
- To: "Michael Kay" <mike@saxonica.com>
- Date: Mon, 22 Oct 2007 17:18:23 -0400
I agree with Mike's intuition. Beyond that, you'd have to do more than
say "fast". If you said something like: on a 2Ghz Xeon we need to parse
and validate 1000 messages documents per second, of average size 10K bytes
each, with moderately dense markup, and throwing SAX events as the API,
then it's possible that someone would have an intuition as to whether off
the shelf parsers such as Xerces can do it. Of course, your mileage will
vary according to the details, but saying I need a fast parser is a bit
like saying I need a fast car. What you mean by fast may depend on
whether you're driving Nascar, Formula 1, or just trying to make good time
on a vacation.
For what it's worth, my group published a paper on some experimental work
we did on high performance validation a few years ago. The parser we
described was a prototype, and it remains difficult (as far as I know) to
find off the shelf parsers that give quite the speed we reported.
Nonetheless, the paper includes some benchmarks for then-current versions
of Xerces doing validation. Those are not official Apache or IBM
benchmarks, but they were run with some care, and I expect that Xerces has
probably improved a bit in speed since then. So, you might want to check
out the paper. It also explains in great detail some of the factors that
we found to be issues when trying to parse and validate at high speed.
Copies are available online at [1]. I suggest that unless you have a
strong preference for html that you read the PDF version; the formatting
is much better.
Noah
[1] http://www2006.org/programme/item.php?id=5011
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
"Michael Kay" <mike@saxonica.com>
10/22/2007 03:42 PM
To: "'Llacuna, Phillip V'" <phillip.v.llacuna@lmco.com>,
<xml-dev@lists.xml.org>
cc: (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: RE: [xml-dev] Fast validating XML parser
I suspect that an off-the-shelf parser like Xerces is quite fast enough if
your application invokes it intelligently. You might find parsers that are
20% faster than that, but I think the order-of-magnitude improvement will
come by changing your application architecture: in particular, change the
driving code from Javascript to Java.
Xerces has a fairly high start-up cost so it's worth reusing the parser
for multiple documents. However, that's more of a factor when your files
are 200 bytes rather than 50K bytes.
Michael Kay
http://www.saxonica.com/
From: Llacuna, Phillip V [mailto:phillip.v.llacuna@lmco.com]
Sent: 22 October 2007 19:32
To: xml-dev@lists.xml.org
Subject: [xml-dev] Fast validating XML parser
Hi:
We need a very fast validating XML parser and was wondering if anyone has
any suggestions? Our project involves one main XML file with about 1200
supporting XML files (each about 50KB or less). Our current environment
calls on a java script to validate each file against the DTD, but it is
painfully slow to process the complete project. We suspect that that the
overhead in creating the java environment each time the script is called
is slowing down the process. I have searched (and am still searching) the
web for a good alternative. Any suggestions?
Phillip Llacuna
Multi-media Design Engineer
Lockheed Martin
Ph: (651) 456-7152
Fax: (651) 456-2643
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]