[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Streams
- From: noah_mendelsohn@us.ibm.com
- To: Rob Cameron <cameron@cs.sfu.ca>
- Date: Mon, 25 Feb 2008 20:43:52 -0500
Rob Cameron wrote:
> I am pleased to announce the availability of parabix-0.40, a
> high-performance
> XML parsing engine prototype that can parse text-oriented XML document
> on commodity processors at over 200MB/sec per processor GHz and
> data-oriented XML documents at speeds approaching that.
> [...]
> By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs
> parsing, validation and business object creation on commodity
> processors at
> the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial
> increase over the cited rate of 2.5-6 MB/sec/GHz for
> traditional validating
> parsers.
As one of the authors of the XML Screamer paper, let me say that this
sounds like terrific work. I have for awhile been curious about the use
of SIMD features of modern processors to go beyond what we did with XML
Screamer. (It's ancient history, but years ago I spent some time
performance optimizing code using the SIMD features that were then
available on IBM mainframes -- the so-called Vector Factility -- and I've
been interested ever since in the use of such SIMD systems to speed text
processing and other systems-style code.) The performance numbers you
quote are impressive, but also very consistent IMO with what one might
expect, in that it was always clear that systems like Screamer were
ultimately limited by the CPU speed of the processors we used rather than
by the memory system, and the SIMD side of these chips should be able to
use the memory system more effectively. I very much look forward to
reading your PPoPP paper (I happen to be on an airplane just now, and
therefore can't get to an the Internet to look for a copy.)
Congratulations on what sounds like a truly impressive step forward in XML
performance!
Noah
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Rob Cameron <cameron@cs.sfu.ca>
02/25/2008 08:13 AM
To: xml-dev@lists.xml.org
cc: (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: [xml-dev] XML parsing @ 100MB-1000MB/sec/GHz with
Parallel Bit Streams
I am pleased to announce the availability of parabix-0.40, a
high-performance
XML parsing engine prototype that can parse text-oriented XML document
on commodity processors at over 200MB/sec per processor GHz and
data-oriented XML documents at speeds approaching that. At this point,
this includes correct parsing of correct documents and dispatch to markup
action routines using an in-line API for XML (ilax). As the parabix
stack
is built out to incorporate validation and object creation, I am expecting
overall performance above 100MB/sec/GHz. With linear speed-up on
multicore processors and other improvements, 1000MB/sec/GHz is
forseeable.
By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs
parsing, validation and business object creation on commodity processors
at
the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial
increase over the cited rate of 2.5-6 MB/sec/GHz for traditional
validating
parsers.
This is very good performance for traditional character-at-a-time parsing,
taking advantage of a collection of techniques such as optimization
across layers and schema-based customization. As a benchmark,
100 MB/sec/GHz is cited as the limit on throughput achievable for a
simple character-at-a-time scanning loop.
My research is investigating the development of very high-speed text
processing based on a fundamentally new approach: using parallel bit
streams to represent character data and the SIMD processor capabilities
of commodity CPUs to process these bit streams.
I have first applied these techniques to the problem of UTF-8 to
UTF-16 transcoding, to achieve end-to-end speed-up of 3X to 25X
compared with standard iconv and similar implementations. The
open source implementation of u8u16 is available at
http://u8u16.costar.sfu.ca/ and the results have just been
presented to ACM PPoPP 2008 in Salt Lake City.
Parabix (parallel bit streams for XML) is a research prototype that is
nevertheless being designed to become the basis for a full XML
processing stack. The working code repository is now available
as an open source code base under OSL 3.0.
http://parabix.costar.sfu.ca/
I am hoping to accelerate development of parabix technology through the
open source model as well as continuing the academic research project
with a team of graduate students who are coming up to speed. I have
also created a spin-off company to oversee commercial development
of the technology.
However, in the context of discussion of XML performance issues and
the next ten years of development of XML technology, I think that
the work is sufficiently well advanced to support the following advice:
Do not assume that XML processing performance is inherently limited
by the nature of present-day character-at-a-time parsing technology.
Intraregister and intrachip parallelism hold out a realistic promise of
dramatic performance improvement on commodity processors.
--
Robert D. Cameron, Ph.D.
Professor of Computing Science, Simon Fraser University
President and CTO, International Characters, Inc.
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]