Re: [xml-dev] XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Strea

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Streams

From: Rob Cameron <cameron@cs.sfu.ca>
To: John Snelson <john.snelson@oracle.com>
Date: Mon, 25 Feb 2008 11:08:51 -0800

Thanks for that report, John.

For research purposes, my present work with parabix is focused
entirely on CPU time for parsing once the data is available.    This is 
where the parallel bit stream methods make a difference.   Of course, 
the I/O will have to be optimized at some point.

On the memory usage front, we are presently using a big slurp
to ease the research work.   However, much of the design is
organized around a streaming model.   

> Hi Rob,
> 
> Those are very impressive figures! I downloaded your parser and did a
> quick test to compare it to Expat parsing a 1.1Gb XML file:
> 
> Expat: 21.0s (wallclock), 18.2s (user time)
> Parabix: 21.7s (wallclock), 4.0s (user time)
> 
> I used the "markup_stats" program that came with Parabix. Clearly
> Parabix is spending less time with heavy CPU load (user time), but it
> still takes longer to parse when disk IO is included (wallclock time).
> Parabix also seems to take far more memory than Expat - proportional to
> the size of the document?
> 
> Can the IO and memory usage in "markup_stats" be improved, or is this an
> intrinsic problem with your approach to XML parsing?
> 
> John
> 
-- 
Robert D. Cameron, Ph.D.
Professor of Computing Science
Simon Fraser University

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]