OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XML Performance in a Transacation

[ Lists Home | Date Index | Thread Index ]

Wolfgang Hoschek wrote:

> This is even though the conversion routines are highly  optimized, 
> taking full advantage of pure or partial ASCII valued  data, similar 
> in spirit to the technique your blog mentions (except  that it's in Java).

Oh, if it is Java it is not really similar in spirit. The point of the 
blog is not scanning ahead for non-ASCII codes but on taking advantage 
of the parallelism/pipelining functions in current CPUs, as exposed by 
C++ Intrinsic funcitons, to speed things up.  I apologize for not being 

> I do have some hope that future VMs with better  dynamic optimization 
> logic for memory prefetching, bulk operations,  etc. could make more 
> of a difference here, though. Care to explain  why a dynamic optimizer 
> couldn't get close to what those handcoded  assembler routines do, in 
> particular considering modern memory  latencies?

It is highly unlikely that a programmer would write code that is readily 
parallelizable* into optimal SSE2 instructions unless they knew SSE2's 
constraints in the first place.  They have to process data in 128-bit 
chunks. The data has to be aligned on certain memory boundaries. Only 
some kinds of data are allowed. Only some kinds of operations are 
available. Almost any call to a function or method will break the 
pipeline.  Expressions have to be written with certain variable-writing 
constraints to prevent pipeline stalling. Expressions have to be written 
to interleave use of different execution units in the CPU.

(*I say parallelizable, because the intrinsics make pipelined 
instructions look to the programmer like parallel instructions.)

The reason that current C++ compilers don't attempt to do anything 
sophisticated with parallelization is that it is too hard and 
defeatable. Providing built-in Intrinsic functions which act on special 
built-in 128bit data types had turned out to be workable instead.

I think Java's best hopes are
  * add little optimizations like my one to the X86 version of the Java 
libraries, and call as native code;

  * add more functions to System that can use SSE2, but hide it. For 
example, a function to scan a byte array and detect the location of the 
first non-ASCII code value like my example. But the Java designers could 
only do this *after* it becomes clear what the useful functions are, and 
this will only happen *after* programmers have explored using the SSE2 
instructions for non-mathematical uses like parsing;

 * add some kinds of annotations and datatypes to support small-grain 
parallelized/pipelined code, generalizing SSE2 or perhaps even just 
having direct equivalents to SSE intrinsics:   @parallel(128)  ?

> On the standard textual XML front: As has been noted, Xerces and  
> woodstox can be made to run quite fast, but in practise, few people  
> know how do configure them accordingly, and to do so reliably, and 
> without conformance compromises.

A red herring.  Xerces' defaults are an issue unrelated to the merits of 
stimulating software developers to use modern C++ features instead of 
sticking to slow 90's features.

(In any case, these optimisations are potentially also applicable to 
binary XML parsing as well as to real XML processing.)

> Most users can't  afford to study the complex reliability vs. 
> performance interactions  of myriads of more or less static tuning knobs.

Same fish.

Rick Jelliffe


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS