Lists Home |
Date Index |
The time data was just a side issue, since this test is really focused
on memory usage; for consistent time measurements I'd want to run
multiple passes and generally be a lot more careful in testing. Just for
the record, though, this was run on Mandrake 9.1, AMD 2000+ Athlon, 512M
2700 DDR system. I've adjusted the test code to add less overhead when
generating the data for Strings, and from a quick sample run the String
times now look more in line with what I'd expect based on the other
times. The code's at http://www.sosnoski.com/ObjectSize.java if anyone
wants to try it out themselves, and you can just contact me directly if
you have any questions.
I agree the String size matches up when you look at what goes into one.
Java uses 16-bit Unicode characters, so the 4 characters actually need 8
bytes; I'd expect that even smaller Strings (1-3 characters) would still
need the same size because 8 bytes appears to be the allocation unit size.
I just ran this using IBM's 1.4.0 JVM for comparison purposes. The
Object size came out larger (16 bytes), as did String (56), but the
other sizes were the same. The IBM JVM was much faster, though, as is
typical of what I've seen in the past; Sun's JVMs are optimized for long
term performance on complex applications and suffer with simple programs.
From what I remember most of the document models actually have reusable
QName-type objects to represent element and attribute names internally,
so I don't think the overhead of names is a major part of the document
model size multiplier. Instead I'd attribute it to the sheer number of
objects involved. A naive implementation of an element object, for
instance, will have the object itself (including perhaps a parent
reference and other structural information), the (shared) QName, and
ArrayLists or equivalents of both attributes and content components.
Optimizations can eliminate a lot of this overhead, such as by using
lazy creation of the attribute and content lists. That creates the
potential for other problems, though. For example, in at least some
versions of JDOM inspecting the tree actually modified the data
representation because JDOM used live lists. The only way to find out if
attributes were present was to ask for the list of attributes - which
meant the list had to be created and added to the element just so you
could check the size of the list. This meant that looking at the data
increased the memory size of the tree, and also meant that the
representation was not threadsafe even for read-only access. Xerces
still has this problem if you use the deferred node expansion feature
that's turned on by default.
Bob Foster wrote:
>Thank you very much for the information.
>The "time" data can't be interpreted without knowing what sort of
>machine ran the tests. What was it's speed, memory size, etc.?
>Also, I have a question about the units. Am I correct in reading that
>the average simple Object construction time was 59ms/200000 = 0.295
>Re String size: Assuming all your numbers are based on Sun JVM 1.4.2,
>there is a 16-byte overhead for any array, an 8-byte overhead for an
>Object, and a String object contains size, offset and hash ints and an
>array reference, another 16 bytes. That's 40 bytes independent of the
>size of the string. If you created the string with new String("abcd"),
>it should require 44 bytes. Don't know where the other 4 bytes are.
>(If String values are associated with element and attribute nodes, they
>probably use more storage than the nodes. A compact element node
>requires only 8 bytes of overhead, 12 bytes for parent, firstChild and
>nextSibling references, and 4 bytes for a String reference, a total of
>Dennis Sosnoski wrote:
> > In older JVMs (1.1.8) even a simple Object would take about 32 bytes.
> > Now that's down to 8 bytes, for the Sun JVM 1.4.2 on Linux:
> > Base Object starting memory usage 884792, ending usage 2484792
> > Base Object size in bytes: 8
> > Base Object construction time in ms. for 200000 instances: 59
> > String (4 characters) starting memory usage 884952, ending usage 10484952
> > String (4 characters) size in bytes: 48
> > String (4 characters) construction time in ms. for 200000 instances: 851
> > Integer starting memory usage 884952, ending usage 4084952
> > Integer size in bytes: 16
> > Integer construction time in ms. for 200000 instances: 135
> > byte Array (0 length) starting memory usage 884952, ending usage 4084952
> > byte Array (0 length) size in bytes: 16
> > byte Array (0 length) construction time in ms. for 200000 instances: 128
> > byte Array (8 length) starting memory usage 884952, ending usage 5684952
> > byte Array (8 length) size in bytes: 24
> > byte Array (8 length) construction time in ms. for 200000 instances: 211
> > Reference Array (8 length) starting memory usage 884952, ending usage
> > 10484952
> > Reference Array (8 length) size in bytes: 48
> > Reference Array (8 length) construction time in ms. for 200000
> > instances: 628
> > This is from a modified version of the code I used for a JavaWorld
> > article a few years back (the article now mostly obsolete, so I won't
> > link it). Don't know why Strings are so slow and so large (figure 24
> > bytes for the char, but that still leaves another 24 bytes just for
> > the String data), but it would certainly account for a lot of the bloat
> > in document models. In general, document models take about 4-8X the
> > document size in bytes (with the low end for documents that are mostly
> > text): http://www.sosnoski.com/opensrc/xmlbench/results.html#size These
> > results are a couple of years old (and from a 1.3.X JVM), but still
> > fairly accurate from what I've seen.
> > - Dennis