XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Streams

Rob Cameron wrote:

> I am pleased to announce the availability of parabix-0.40, a 
> high-performance
> XML parsing engine prototype that can parse text-oriented XML document
> on commodity processors at over 200MB/sec per processor GHz and 
> data-oriented XML documents at speeds approaching that. 

> [...]


> By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs
> parsing, validation and business object creation on commodity 
> processors at
> the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial
> increase over the cited rate of 2.5-6 MB/sec/GHz for 
> traditional validating
> parsers.

As one of the authors of the XML Screamer paper, let me say that this 
sounds like terrific work.  I have for awhile been curious about the use 
of SIMD features of modern processors to go beyond what we did with XML 
Screamer.   (It's ancient history, but years ago I spent some time 
performance optimizing code using the SIMD features that were then 
available on IBM mainframes -- the so-called Vector Factility -- and I've 
been interested ever since in the use of such SIMD systems to speed text 
processing and other systems-style code.)  The performance numbers you 
quote are impressive, but also very consistent IMO with what one might 
expect, in that it was always clear that systems like Screamer were 
ultimately limited by the CPU speed of the processors we used rather than 
by the memory system, and the SIMD side of these chips should be able to 
use the memory system more effectively.  I very much look forward to 
reading your PPoPP paper (I happen to be on an airplane just now, and 
therefore can't get to an the Internet to look for a copy.) 
Congratulations on what sounds like a truly impressive step forward in XML 
performance!

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Rob Cameron <cameron@cs.sfu.ca>
02/25/2008 08:13 AM
 
        To:     xml-dev@lists.xml.org
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        [xml-dev] XML parsing @ 100MB-1000MB/sec/GHz with 
Parallel Bit Streams


I am pleased to announce the availability of parabix-0.40, a 
high-performance
XML parsing engine prototype that can parse text-oriented XML document
on commodity processors at over 200MB/sec per processor GHz and 
data-oriented XML documents at speeds approaching that.    At this point, 
this includes correct parsing of correct documents and dispatch to markup 
action routines using an in-line API for XML (ilax).    As the parabix 
stack 
is built out to incorporate validation and object creation, I am expecting
overall performance above 100MB/sec/GHz.  With linear speed-up on
multicore processors and other improvements, 1000MB/sec/GHz is 
forseeable.

By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs
parsing, validation and business object creation on commodity processors 
at
the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial
increase over the cited rate of 2.5-6 MB/sec/GHz for traditional 
validating
parsers.

This is very good performance for traditional character-at-a-time parsing,
taking advantage of a collection of techniques such as optimization
across layers and schema-based customization.  As a benchmark, 
100 MB/sec/GHz is cited as the limit on throughput achievable for a
simple character-at-a-time scanning loop.

My research is investigating the development of very high-speed text
processing based on a fundamentally new approach:  using parallel bit
streams to represent character data and the SIMD processor capabilities
of commodity CPUs to process these bit streams.

I have first applied these techniques to the problem of UTF-8 to
UTF-16 transcoding, to achieve end-to-end speed-up of 3X to 25X
compared with standard iconv and similar implementations.   The 
open source implementation of u8u16 is available at
http://u8u16.costar.sfu.ca/ and the results have just been
presented to ACM PPoPP 2008 in Salt Lake City.

Parabix (parallel bit streams for XML) is a research prototype that is
nevertheless being designed to become the basis for a full XML
processing stack.  The working code repository is now available
as an open source code base under OSL 3.0. 
http://parabix.costar.sfu.ca/

I am hoping to accelerate development of parabix technology through the
open source model as well as continuing the academic research project
with a team of graduate students who are coming up to speed.    I have
also created a spin-off company to oversee commercial development
of the technology.

However, in the context of discussion of XML performance issues and
the next ten years of development of XML technology, I think that
the work is sufficiently well advanced to support the following advice:
Do not assume that XML processing performance is inherently limited
by the nature of present-day character-at-a-time parsing technology.
Intraregister and intrachip parallelism hold out a realistic promise of
dramatic performance improvement on commodity processors.
-- 
Robert D. Cameron, Ph.D.
Professor of Computing Science, Simon Fraser University
President and CTO, International Characters, Inc.


_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS