Re: [xml-dev] No XML Binaries? Buy Hardware

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Wei Lu <welu@cs.indiana.edu>
To: noah_mendelsohn@us.ibm.com
Date: Sat, 24 Feb 2007 10:54:18 -0500 (EST)



On Fri, 23 Feb 2007 noah_mendelsohn@us.ibm.com wrote:

> Elliotte Harold writes:
>
> > I don't think we've hit the limits of parser performance yet,
>
> I think that asking "what are the theoretical limits" is very important,
> and it's not a question I've seen discussed often enough.  From our paper
> [1]:
>
> > No parser can process input faster than its supporting hardware
> > accesses data, but the additional cost of parsing and
> > validation should be minimized. On a 1 GHz Pentium processor a
> > simple character-scanning loop runs at about 100 Mbytes/second,
> > which is 10 cycles/byte.
>
> > [..]
>
> > On the tests reported in this paper, using the business object
> > API typical of Web Services applications, XML Screamer parses
> > and schema-validates XML at between 23 and 46 Mbytes/sec/GHz;
> > XML Screamer can thus process XML at speeds of roughly 100–200
> > Mbytes/sec on the 4 GHz processors now becoming available.
>
> > [...]
>
> > Using its business object APIs, XML Screamer scans, parses,
> > validates and deserializes at between 22% and 44% of the tested
> > processor's raw character scanning speed. Except insofar as
> > ways can be found to use such processors more efficiently, e.g.
> > by exploiting hardware string test instructions or on chip SIMD
> > accelerators, gains from further tuning or alternative
> > approaches are likely to be modest. XML Screamer's performance
> > is probably not far from the maximum achievable.
>
> In short, we observed that to check well formedness, a parser must at
> least touch each input character.  You can benchmark various
> processor/memory combinations using their most optimized forms of
> character and string comparison and find out how fast they can inspect
> each byte of an input buffer, doing the sorts of character comparisons
> necessary for well formedness checking.   There may be ways to do better
> than we did on particular processors, but I think it's interesting that
> one can set a pretty good bound on how fast XML processing can go.

If we relax the definition of XML well-formedness checking, we can have
better bound on the  performance. In an extreme case, if we regard the XML
document as equivlent as a sequence of matched parenthesises and if our
goal is to find the scope of the pair of tags and its relationship with
other pairs in the doucment, lots of work can be saved. Our test shows
that this kind of "pre-parsing" is 6-7x time faster than the SAX api
parsing provided by libxml2 [2]. The result of "pre-parsing" can be used
to guide the parallel xml processing or lazy xml processing or other forms
optimizations. So I think the bound on XML processing performance
depends on what the XML processing needs in minimun and we really have
more space to push the bound


[2] http://www.cs.indiana.edu/~welu/pxp_grid06.pdf

Wei Lu
Indiana University


> Furthermore, I think our work shows that it is possible to get not to far
> from that bound, for some definition of "not to far" :-).
>
> Noah
>
> [1] http://www2006.org/programme/item.php?id=5011
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
>
>
>
>
>

References:
- Re: [xml-dev] No XML Binaries? Buy Hardware
  - From: noah_mendelsohn@us.ibm.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]