[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Streams
- From: Rob Cameron <cameron@cs.sfu.ca>
- To: xml-dev@lists.xml.org
- Date: Mon, 25 Feb 2008 05:13:17 -0800
I am pleased to announce the availability of parabix-0.40, a high-performance
XML parsing engine prototype that can parse text-oriented XML document
on commodity processors at over 200MB/sec per processor GHz and
data-oriented XML documents at speeds approaching that. At this point,
this includes correct parsing of correct documents and dispatch to markup
action routines using an in-line API for XML (ilax). As the parabix stack
is built out to incorporate validation and object creation, I am expecting
overall performance above 100MB/sec/GHz. With linear speed-up on
multicore processors and other improvements, 1000MB/sec/GHz is
forseeable.
By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs
parsing, validation and business object creation on commodity processors at
the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial
increase over the cited rate of 2.5-6 MB/sec/GHz for traditional validating
parsers.
This is very good performance for traditional character-at-a-time parsing,
taking advantage of a collection of techniques such as optimization
across layers and schema-based customization. As a benchmark,
100 MB/sec/GHz is cited as the limit on throughput achievable for a
simple character-at-a-time scanning loop.
My research is investigating the development of very high-speed text
processing based on a fundamentally new approach: using parallel bit
streams to represent character data and the SIMD processor capabilities
of commodity CPUs to process these bit streams.
I have first applied these techniques to the problem of UTF-8 to
UTF-16 transcoding, to achieve end-to-end speed-up of 3X to 25X
compared with standard iconv and similar implementations. The
open source implementation of u8u16 is available at
http://u8u16.costar.sfu.ca/ and the results have just been
presented to ACM PPoPP 2008 in Salt Lake City.
Parabix (parallel bit streams for XML) is a research prototype that is
nevertheless being designed to become the basis for a full XML
processing stack. The working code repository is now available
as an open source code base under OSL 3.0.
http://parabix.costar.sfu.ca/
I am hoping to accelerate development of parabix technology through the
open source model as well as continuing the academic research project
with a team of graduate students who are coming up to speed. I have
also created a spin-off company to oversee commercial development
of the technology.
However, in the context of discussion of XML performance issues and
the next ten years of development of XML technology, I think that
the work is sufficiently well advanced to support the following advice:
Do not assume that XML processing performance is inherently limited
by the nature of present-day character-at-a-time parsing technology.
Intraregister and intrachip parallelism hold out a realistic promise of
dramatic performance improvement on commodity processors.
--
Robert D. Cameron, Ph.D.
Professor of Computing Science, Simon Fraser University
President and CTO, International Characters, Inc.
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]