[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Speed in Languages and Browser Architectures
- From: noah_mendelsohn@us.ibm.com
- To: Elliotte Harold <elharo@metalab.unc.edu>
- Date: Fri, 2 Mar 2007 16:42:24 -0500
Elliotte Rusty Harold writes:
> If performance were really a concern, and String proved to be the real
> bottleneck, it's entirely possible someone could write an XML API based
> on bytes rather than strings. So far I don't think anyone's really had
> the motivation to do so. Either it hasn't been shown that strings are
> the problem, or they're not a big enough problem that anyone wants to
> take the time to fix them.
Yes, but as I wrote earlier, that doesn't necessarily get you out of the
woods performance-wise. Bytes give you the low level access, but there
still is the issue of bounds checks, and also whether things like scanning
and pattern matching get translated into whatever are the best performing
instruction sequences on the particular hardware. I have seen some Java
parsers written at the byte level in Java that compete pretty well with
optimized C parsers, but usually in the end the C is at least as fast
usually faster. Note that always in Java, a lot can depend on which
built in methods have been optimized on a particular VM for a particular
platform.
Also, regarding APIs: while it's true that byte-oriented APIs can often
be faster for XML than string-oriented ones, we've usually found in our
work that the fastest ones work with some sort of string pool (not
necessarily based on java.lang.Strings) and some sort of handle to
represent the strings. So, to indicate that I found the tag <myElement>,
I don't pass you either a string or a byte array, I pass you some sort of
handle (typically an integer) that we've agreed refers to the string
"myElement". If you do that in the native encoding of the input, and
carry the convention all the way up to the deseriliazer, the consuming
application, etc., things get faster. This is a case where having some
sort of schema can help, as it usually gives you a hint in advance as to
at least some of what needs to be in the string pool. Sometimes you can
use compile-time constants for well-known strings. Without a schema or
DTD, all you can do is build up the pool at runtime, or if you have reason
to believe that multiple instances will use similar vocabularies, share
the pool from one run to the next.
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]