xml-dev - Re: [xml-dev] XML Performance in a Transacation

Re: [xml-dev] XML Performance in a Transacation

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] XML Performance in a Transacation
From: Rick Jelliffe <rjelliffe@allette.com.au>
Date: Fri, 24 Mar 2006 14:41:08 +1100
In-reply-to: <200603231325.k2NDPBG8029517@modelo.allette.com.au>
References: <200603231325.k2NDPBG8029517@modelo.allette.com.au>
User-agent: Mozilla Thunderbird 0.6 (X11/20040502)

Michael Kay wrote:

>>My expectation is that XML parsing can be significantly sped up with ...
>>    
>>
>
>I think that UTF-8 decoding is often the bottleneck and the obvious way to
>speed that up is to write the whole thing in assembler. I suspect the only
>way of getting a significant improvement (i.e. more than a doubling) in
>parser speed is to get closer to the hardware. I'm surprised no-one has done
>it. Perhaps no-one knows how to write assembler any more (or perhaps, like
>me, they just don't enjoy it).
>  
>
Yes.* The technique using C++ intrinsics (which is assembler in 
disguise) I gave in my blog (URL in previous post) gives a *four to 
five* times speed increase compared to fairly tight C++ code, for the 
libxml utf-8 to UTF-16 transcoder,  for ASCII valued data.

The fact that this gives such a large increase in what is sometimes 
deemed to be a bottleneck augers well for XML speed-ups generally. *If* 
the dynamic for optimization were there. Unfortunately, I expect that 
(MS excepted) that the people most active in internationalization 
libraries (like Mark Davis' wonderful ICU project at IBM, which feeds 
into Java for example, or libxml) are more keen on cross-platform 
portability (in C++ and Java) rather than exploiting the fast 
instructions of particular CPUs. 

With modern CPUs and things like SSE2, even the best C++ (without 
intrinsics) or Java code simply cannot compete with C++ with intrinsics 
or with assembler; Java has certainly caught up with C++ but the 
introduction of intrinsics has shifted the goalposts for C++ far beyond 
what Java can cope with. The simple reason is that you need to use 
particular data structures (e.g. 16 byte arrays) and particular 
(non-stalling) algorithms, and you need to have some kind of pragma or 
tag to tell the compiler that your code is SSE friendly: it is just too 
hard, hence the rise of intrinsics and the competitive fall of Java/C# etc.

Because Open Source XML software generally is either Java or 
cross-platform C++, there hasn't been the dynamic there to add 
processor-dependent optimizations or algorithms. I hope that will 
change. But the change will happen faster with user demand: in 
particular, the organizations with high transaction needs who are 
currently doing such a convincing impersonation of passive beached 
whales. They need to be more proactive in stimulating open source 
development in more efficient XML; it directly addresses their business 
requirements for high transaction rates. Hence my suggestion for a 
consortium offering a cash prize.

There are still a lot of CPUs without SSE out there. I also develop 
digital audio filters for modular synthesizers, as a hobby, and I am 
surprised at the number of older machines stillout there. But they are 
disappearing fast.

Cheers
Rick Jelliffe

* Yes for getting closer to the hardware. But assembler is not necessary 
IMHO: compilers generate great code...the big performance potential is 
from utilizing the pipelining/streaming instructions on small arrays, in 
particular SSE2 on x86. You don't need to resort to assembler because of 
availability of C++ intrinsic functions.

I was surprised about them: I had thought C and C++ had stopped evolving 
in any interesting way in the early 90s and that the action was in 
Java/C#. But these Intrinsics utterly change the tradeoffs for Java 
versus C++, back in favour of C++. I (and I think a lot of my generation 
of programmers who switched to Java in the 90s) missed the advent of 
Intrinsics this decade. I think Java still wins for WORA Java versus 
WORA C++, but for optimised code, it is no contest: Java simply cannot 
compete with C++ with intrinsics because Java can only make use of SSE2 
instructions piecemeal, just as normal instructions, but has no 
capability of making use of their pipelining or virtual parallelism.

In DSP, compiling the same C++ code to use SSE instructions usually 
results in a 30% increase in performance. Small, but good. C+ compilers 
with "generate SSE" ticked, or Javva compilers can get this kind of 
advantage now with. But the real speed up, the speed up I am talking 
about, is to use the SSE(2) code for pipelining, which requires that you 
structure your algorithm in a way to suit the CPU. That is beyond what 
we can expect a Java compiler to do unaided.

So please, no flames or pointers to benchmarks of Java versus C++ based 
on code that is not written to utilize intrinsics: they are so 90s! 

Java needs to add the equivalent of intriniscs (i.e. in the way that 
they have system dependent arraycopy method), to catch up with C++.  
There may be some way to make it slightly more CPU independent (e.g. 
make the arraysize of MIMD data such as _i128 into a System constant.)

Follow-Ups:
- RE: [xml-dev] XML Performance in a Transacation
  - From: "Michael Kay" <mike@saxonica.com>
- RE: [xml-dev] XML Performance in a Transacation
  - From: "Michael Kay" <mike@saxonica.com>
- Re: [xml-dev] XML Performance in a Transacation
  - From: Wolfgang Hoschek <wolfgang.hoschek@mac.com>

Prev by Date: Re: [xml-dev] XML Performance in a Transacation
Next by Date: Re: [xml-dev] XML Performance in a Transacation
Previous by thread: Re: [xml-dev] XML Performance in a Transacation
Next by thread: Re: [xml-dev] XML Performance in a Transacation
Index(es):
- Date
- Thread