Lists Home |
Date Index |
Glad to see such an, err, "enthusiastic" response. As the web page says,
I'd intended to update this long ago. I've been sidetracked but will try
to get back to it later this month, when I want to compare document size
and processing speed for collections of documents using a common schema.
I'll also try to find the fastest available SAX2 parser to use as an
I'd suggest you don't waste time trying Java serialized versions of DOM
- the results are horrible. You can see some at the bottom of my
document models benchmarks page, at
http://www.sosnoski.com/opensrc/xmlbench/results.html. The main problem
is that all the document representations (DOM, JDOM, dom4j, etc.) are
tree structures of generally small objects, while Java serialization is
optimized for graph structures. It uses (fairly large) handles for each
object, and actually includes the handles in the encoding (as opposed to
just making the values sequential and implicit). This adds a lot of
bloat - Java serialized Xerces DOM ran about twice the size of the text
documents in the tests I've run.
Alaric Snell wrote:
> - uuugghh, I just ejaculated (sorry, ladies)!
>That's the kind of experiment I was planning to perform this weekend, and the
>kinds of results I imagined getting.
>The only difference is that I'd introduce gzipped versions of the text,
>serialised DOM tree, and XMLS data, including the time taken to deflate and
>inflate the data. Just since people keep raising gzipped text.
>I'll try and do that this weekend...