xml-dev - Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)

Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)

[ Lists Home | Date Index | Thread Index ]

To: Bob Foster <bob@objfac.com>
Subject: Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
From: Dennis Sosnoski <dms@sosnoski.com>
Date: Wed, 07 Apr 2004 11:35:42 -0700
Cc: "K. Ari Krupnikov" <ari@cogsci.ed.ac.uk>, xml-dev@lists.xml.org
In-reply-to: <40743E21.6030506@objfac.com>
References: <40716EFF.4050405@attglobal.net> <407182A3.5090502@objfac.com> <40729C26.1080806@chipware.com> <4072A4C0.8010504@attglobal.net> <86y8p84qh3.fsf@fdra.lib.aero> <407366E2.6080700@objfac.com> <4073AFC4.1010302@sosnoski.com> <40743E21.6030506@objfac.com>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312

The time data was just a side issue, since this test is really focused 
on memory usage; for consistent time measurements I'd want to run 
multiple passes and generally be a lot more careful in testing. Just for 
the record, though, this was run on Mandrake 9.1, AMD 2000+ Athlon, 512M 
2700 DDR system. I've adjusted the test code to add less overhead when 
generating the data for Strings, and from a quick sample run the String 
times now look more in line with what I'd expect based on the other 
times. The code's at http://www.sosnoski.com/ObjectSize.java if anyone 
wants to try it out themselves, and you can just contact me directly if 
you have any questions.

I agree the String size matches up when you look at what goes into one. 
Java uses 16-bit Unicode characters, so the 4 characters actually need 8 
bytes; I'd expect that even smaller Strings (1-3 characters) would still 
need the same size because 8 bytes appears to be the allocation unit size.

I just ran this using IBM's 1.4.0 JVM for comparison purposes. The 
Object size came out larger (16 bytes), as did String (56), but the 
other sizes were the same. The IBM JVM was much faster, though, as is 
typical of what I've seen in the past; Sun's JVMs are optimized for long 
term performance on complex applications and suffer with simple programs.

 From what I remember most of the document models actually have reusable 
QName-type objects to represent element and attribute names internally, 
so I don't think the overhead of names is a major part of the document 
model size multiplier. Instead I'd attribute it to the sheer number of 
objects involved. A naive implementation of an element object, for 
instance, will have the object itself (including perhaps a parent 
reference and other structural information), the (shared) QName, and 
ArrayLists or equivalents of both attributes and content components.

Optimizations can eliminate a lot of this overhead, such as by using 
lazy creation of the attribute and content lists. That creates the 
potential for other problems, though. For example, in at least some 
versions of JDOM inspecting the tree actually modified the data 
representation because JDOM used live lists. The only way to find out if 
attributes were present was to ask for the list of attributes - which 
meant the list had to be created and added to the element just so you 
could check the size of the list. This meant that looking at the data 
increased the memory size of the tree, and also meant that the 
representation was not threadsafe even for read-only access. Xerces 
still has this problem if you use the deferred node expansion feature 
that's turned on by default.

  - Dennis

Bob Foster wrote:

>Thank you very much for the information.
>
>The "time" data can't be interpreted without knowing what sort of 
>machine ran the tests. What was it's speed, memory size, etc.?
>
>Also, I have a question about the units. Am I correct in reading that 
>the average simple Object construction time was 59ms/200000 = 0.295 
>microseconds?
>
>Re String size: Assuming all your numbers are based on Sun JVM 1.4.2, 
>there is a 16-byte overhead for any array, an 8-byte overhead for an 
>Object, and a String object contains size, offset and hash ints and an 
>array reference, another 16 bytes. That's 40 bytes independent of the 
>size of the string.  If you created the string with new String("abcd"), 
>it should require 44 bytes. Don't know where the other 4 bytes are.
>
>(If String values are associated with element and attribute nodes, they 
>probably use more storage than the nodes. A compact element node 
>requires only 8 bytes of overhead, 12 bytes for parent, firstChild and 
>nextSibling references, and 4 bytes for a String reference, a total of 
>24 bytes.)
>
>Bob Foster
>
>Dennis Sosnoski wrote:
> 
> > In older JVMs (1.1.8) even a simple Object would take about 32 bytes.
> > Now that's down to 8 bytes, for the Sun JVM 1.4.2 on Linux:
> >
> > Base Object starting memory usage 884792, ending usage 2484792
> > Base Object size in bytes: 8
> > Base Object construction time in ms. for 200000 instances: 59
> > String (4 characters) starting memory usage 884952, ending usage 10484952
> > String (4 characters) size in bytes: 48
> > String (4 characters) construction time in ms. for 200000 instances: 851
> > Integer starting memory usage 884952, ending usage 4084952
> > Integer size in bytes: 16
> > Integer construction time in ms. for 200000 instances: 135
> > byte Array (0 length) starting memory usage 884952, ending usage 4084952
> > byte Array (0 length) size in bytes: 16
> > byte Array (0 length) construction time in ms. for 200000 instances: 128
> > byte Array (8 length) starting memory usage 884952, ending usage 5684952
> > byte Array (8 length) size in bytes: 24
> > byte Array (8 length) construction time in ms. for 200000 instances: 211
> > Reference Array (8 length) starting memory usage 884952, ending usage
> > 10484952
> > Reference Array (8 length) size in bytes: 48
> > Reference Array (8 length) construction time in ms. for 200000
> > instances: 628
> >
> > This is from a modified version of the code I used for a JavaWorld
> > article a few years back (the article now mostly obsolete, so I won't
> > link it). Don't know why Strings are so slow and so large (figure 24
> > bytes for the char[4], but that still leaves another 24 bytes just for
> > the String data), but it would certainly account for a lot of the bloat
> > in document models. In general, document models take about 4-8X the
> > document size in bytes (with the low end for documents that are mostly
> > text): http://www.sosnoski.com/opensrc/xmlbench/results.html#size These
> > results are a couple of years old (and from a 1.3.X JVM), but still
> > fairly accurate from what I've seen.
> >
> >  - Dennis
> >
>  
>

References:
- Eclipse: the new Emacs? (and the XML story)
  - From: David Megginson <dmeggin@attglobal.net>
- Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
  - From: Bob Foster <bob@objfac.com>
- Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
  - From: Mark Schmeets <mark@chipware.com>
- Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
  - From: David Megginson <dmeggin@attglobal.net>
- Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
  - From: ari@cogsci.ed.ac.uk (K. Ari Krupnikov)
- Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
  - From: Bob Foster <bob@objfac.com>
- Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
  - From: Dennis Sosnoski <dms@sosnoski.com>
- Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
  - From: Bob Foster <bob@objfac.com>

Prev by Date: RE: [xml-dev] Competing Specifications - A Good or Bad Thing?
Next by Date: RE: [xml-dev] Competing Specifications - A Good or Bad Thing?
Previous by thread: Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
Next by thread: Re: [xml-dev] Eclipse: the new Emacs? (and the XML story)
Index(es):
- Date
- Thread