Re: [xml-dev] Speed in Languages and Browser Architectures

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: noah_mendelsohn@us.ibm.com
To: Elliotte Harold <elharo@metalab.unc.edu>
Date: Fri, 2 Mar 2007 16:42:24 -0500

Elliotte Rusty Harold writes:

> If performance were really a concern, and String proved to be the real 
> bottleneck, it's entirely possible someone could write an XML API based 
> on bytes rather than strings. So far I don't think anyone's really had 
> the motivation to do so. Either it hasn't been shown that strings are 
> the problem, or they're not a big enough problem that anyone wants to 
> take the time to fix them.

Yes, but as I wrote earlier, that doesn't necessarily get you out of the 
woods performance-wise.  Bytes give you the low level access, but there 
still is the issue of bounds checks, and also whether things like scanning 
and pattern matching get translated into whatever are the best performing 
instruction sequences on the particular hardware.  I have seen some Java 
parsers written at the byte level in Java that compete pretty well with 
optimized C parsers, but usually in the end the C is at least as fast 
usually faster.   Note that always in Java, a lot can depend on which 
built in methods have been optimized on a particular VM for a particular 
platform.

Also, regarding APIs:  while it's true that byte-oriented APIs can often 
be faster for XML than string-oriented ones, we've usually found in our 
work that the fastest ones work with some sort of string pool (not 
necessarily based on java.lang.Strings) and some sort of handle to 
represent the strings.  So, to indicate that I found the tag <myElement>, 
I don't pass you either a string or a byte array, I pass you some sort of 
handle (typically an integer) that we've agreed refers to the string 
"myElement".  If you do that in the native encoding of the input, and 
carry the convention all the way up to the deseriliazer, the consuming 
application, etc., things get faster.  This is a case where having some 
sort of schema can help, as it usually gives you a hint in advance as to 
at least some of what needs to be in the string pool.  Sometimes you can 
use compile-time constants for well-known strings.  Without a schema or 
DTD, all you can do is build up the pool at runtime, or if you have reason 
to believe that multiple instances will use similar vocabularies, share 
the pool from one run to the next.

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

References:
- Re: [xml-dev] Speed in Languages and Browser Architectures
  - From: Elliotte Harold <elharo@metalab.unc.edu>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]