OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Slowness of JDK 1.1.x String.intern() [was Re: SAX, Java,and Namespa

[ Lists Home | Date Index | Thread Index ]
  • From: Tyler Baker <tyler@infinet.com>
  • To: Tim Bray <tbray@textuality.com>
  • Date: Fri, 05 Feb 1999 15:52:10 -0500

Tim Bray wrote:

> At 10:12 AM 2/5/99 -0800, Jeff Greif wrote:
> >JDK 1.1.7 intern is native, but is slow because it first converts the
> >characters in the string
>
> Actually, the real reason that most XML parsers will *never* use
> built-in intern is because they probably have the name available in a
> character array, and can go look things up in the handcrafted
> table without String-i-fying it - thus skipping several steps
> of work that a built-in intern is going to have to do.  E.g. Lark's
> symbol table is a double array, storing both the character-array
> and String version of each name - you lookup based on the
> character array and return the string if it's already there.  The
> point is that you call new String() only once per unique name.

I do pretty much the exact same thing.except on each call to new String()
I do something of the form:

new String().intern().

This way at the application level that for element names and attribute
names you can test for identity instead of equality.  Since you can't
exactly do something like this in any programming language I know of:

String s = new String("foo");
switch (s) {
  case "foo":
  case "bar":
}

You need to write code like this:

if (s.equals("foo")) {

}
else if (s.equals("bar)) {

}
etc.

In cases where the most likely scenario is testing for equality of a lot
of strings and then executing a default action as in the case of an else
statement, this can get expensive.  Even though calling String.intern()
has a one time cost for the first occurrence of an element or attribute
name, repeatedly calling String.equals() can be quite expensive too.

Code of the form:

if (s == "foo")
else if (s == "bar")

is about as fast as an integer compare and even though you may take a
small performance hit at the parser level (or DOM level) in the general
case you will be improving things at the application level even if you use
String.equals() since the String.equals() method is of the form:

public boolean equals(Object o) {
  if (this == o) {
    return true;
  }

  // Do other string comparing code
}

Nevertheless, the String.intern() method has a poor implementation under
the hood.  I don't know what kind of table the JDK is using under the hood
for each JVM, but whatever implementation SUN is using is pretty lame.
But despite the poor implementation of String.intern(), it is still a win
at the application level to be dealing with Names that are represented as
interned strings.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS