[
Lists Home |
Date Index |
Thread Index
]
On Mar 23, 2006, at 7:41 PM, Rick Jelliffe wrote:
> Michael Kay wrote:
>
>>> My expectation is that XML parsing can be significantly sped up
>>> with ...
>>>
>>
>> I think that UTF-8 decoding is often the bottleneck and the
>> obvious way to
>> speed that up is to write the whole thing in assembler. I suspect
>> the only
>> way of getting a significant improvement (i.e. more than a
>> doubling) in
>> parser speed is to get closer to the hardware. I'm surprised no-
>> one has done
>> it. Perhaps no-one knows how to write assembler any more (or
>> perhaps, like
>> me, they just don't enjoy it).
>>
> Yes.* The technique using C++ intrinsics (which is assembler in
> disguise) I gave in my blog (URL in previous post) gives a *four to
> five* times speed increase compared to fairly tight C++ code, for
> the libxml utf-8 to UTF-16 transcoder, for ASCII valued data.
In bnux binary XML, UTF-8 transcoding to Java strings typically
accounts for about 20-50% of parsing at overall throughput of 50 -
400 MB/s [1]. This is even though the conversion routines are highly
optimized, taking full advantage of pure or partial ASCII valued
data, similar in spirit to the technique your blog mentions (except
that it's in Java). I do have some hope that future VMs with better
dynamic optimization logic for memory prefetching, bulk operations,
etc. could make more of a difference here, though. Care to explain
why a dynamic optimizer couldn't get close to what those handcoded
assembler routines do, in particular considering modern memory
latencies?
On the standard textual XML front: As has been noted, Xerces and
woodstox can be made to run quite fast, but in practise, few people
know how do configure them accordingly, and to do so reliably, and
without conformance compromises.
Overall, configuring textual XML toolkits to reach high levels of
performance often requires substantial time and expertise. For
example, to minimize startup and initialization time for each test,
the same parser/serializer object instance should be reused (pooled),
at least in the presence of many small messages. In our experience,
most text XML models perform poorly out-of-the-box. Thus, we expect
most real world applications to perform significantly worse than
shown in our experiments, perhaps dramatically so. Real observed
performance is not only a function of capability, but also of
accessability. Most users would be better off if XML parsers would
perform well out-of-the-box, or be self-tuning. Most users can't
afford to study the complex reliability vs. performance interactions
of myriads of more or less static tuning knobs.
[1] http://www.gridforum.org/GGF15/presentations/wsPerform_hoschek.pdf
Wolfgang.
|