[
Lists Home |
Date Index |
Thread Index
]
- From: Miles Sabin <msabin@cromwellmedia.co.uk>
- To: David Megginson <david@megginson.com>, xml-dev@ic.ac.uk
- Date: Mon, 20 Dec 1999 14:45:35 -0000
David Megginson wrote,
> Stefan Haustein wrote,
> > - building a new object seems some overhead at the first
> > sight, but in JAVA also a new String is a new object...
>
> And that is why most parsers internalize strings rather than
> creating new ones,
This isn't necessarily the best approach. Intern'ing a string
involves a lookup in a JVM-internal hash table. This table is
shared across all threads, and consequently has to be locked
against simultaneous reads and updates. That means we've got
two potential sources of overhead: the lookup itself; and lock
contention between multiple threads trying to access the
table. The former probably isn't a big deal, but that latter
can make for a serious performance hit in heavily threaded
systems, especially on SMP machines. Unless you know there's
not going to be contention (eg., because you know you're
running single threaded) it's probably wisest *not* to intern.
It's also worth remembering that you've got to _already_ have
a String before you can intern it! If you've just created one
(eg. from a portion of a char array) then you're only going to
add overhead by doing an intern in addition.
The only possible benefits are,
1. If you've got a pair of Strings that are both *known* to be
intern'ed you can use == for equality comparisons rather
than equals. 'known' is the crucial qualifier here: in my
experience it's most common that only one of a pair of
Strings will be known to be interned, which means that
before we can use == the other has to be intern'ed first ...
which more than wipes out any speedup.
2. Intern'ed Strings share storage. I can imagine situations
where this _might_ be significant, but they're likely to
be edge cases. Unless you're actually hanging on to
references to large numbers of equal Strings then garbage
collection _should_ recycle the storage allocated to old
ones. Some JVMs might have trouble doing this nicely, but
then the best bet would be to get hold of a better JVM
rather than tying to hack around the problem. Bear in mind
that troublesome JVMs will also cause problems even with
intern'ing ... because, as mentioned in (1), we'll have had
to create a String before we can intern it, and typically
the pre-intern String will be discarded: if gc is slack then
these will pile up even tho' unreferenced.
> and that's why the SAX characters() and ignorableWhiteSpace()
> methods use character arrays rather than strings.
This, on the other hand, can bring genuine gains, at the cost
of considerably uglifying the API.
Cheers,
Miles
--
Miles Sabin Cromwell Media
Internet Systems Architect 5/6 Glenthorne Mews
+44 (0)20 8817 4030 London, W6 0LJ, England
msabin@cromwellmedia.com http://www.cromwellmedia.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|