OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: String interning (WAS: SAX2/Java: Towards a final form)

[ Lists Home | Date Index | Thread Index ]
  • From: Tyler Baker <tyler@infinet.com>
  • To: Miles Sabin <msabin@cromwellmedia.co.uk>
  • Date: Wed, 12 Jan 2000 16:29:18 -0500

Miles Sabin wrote:

> Tyler Baker wrote,
> > Miles Sabin wrote,
> > > [snip: table mapping to intern'd Strings]
> > > Even tho' this only requires one java-intern for each
> > > distinct name it still provides plenty of opportunities for
> > > synchronization collisions.
> >
> > Nope. Names in XML are highly redundant especially for
> > Namespace prefixes. Also, even if the number of calls to
> > String.intern() were significant (which they rarely if ever
> > are), modern Java runtimes have lowered synchronization
> > overhead to be small enough that you don't really have to
> > think about it much in terms of impacting performance
> > anymore.
>
> I think you're making two assumptions that don't always hold.
> Not all java xml applications are one shot, single doctype:
> some continuously parse multiple documents of a variety of
> doctypes in multiple threads. There's not necessarily _any_
> particular upper bound on the number of distinct element and
> attribute names that might be encountered. So there could be
> continual contention for the JVM's intern table.

Well of course there is never an upper bound for the number of distinct element names or
attribute names in a document, but in general you usually have exponentially more elements
and attributes than you do distinct element or attribute names. Trying to satisfy a
condition that will never happen in the real world of how XML will be used, is exactly the
same wrong mode of thinking that I think led to how "Namespaces in XML" came about. The
designers I feel tried to satisfy all of these hypothetical conditions, without ever
thinking about the real world implications. This is what you are doing here which is
laudable, but I don't think really has anything to do with real world use of XML.

> And I think you're assuming a single processor JVM. The
> synchronization overhead picture is *very* different on multi-
> processors.

Synchronization is synchronization. For most documents, making a call to String.intern()
50-100 times in a 100KB document is a lot less expensive than doing:

if (x.equals("foo") {

}
else if (x.equals("bar") {

}
etc...

As opposed to:

if (x == "foo") {

}
else if (x == "bar) {

}
etc.

Calling the equals method can get expensive for large case statements.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS