OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: SAX drivers bug ... or feature !

[ Lists Home | Date Index | Thread Index ]
  • From: Tyler Baker <tyler@infinet.com>
  • To: Toivo Lainevool <tlainevool@yahoo.com>
  • Date: Sat, 21 Nov 1998 16:56:16 -0500

Toivo Lainevool wrote:

> ---david@megginson.com wrote:
> >
> > Depending on the virtual machine, this could be a killer.  Remember
> > that a medium-sized XML document (such as a book) might have 10,000
> > elements: that would mean an extra 10,000 attribute lists allocated
> > and then garbage collected in what should be only a few seconds of
> > parsing.
> >
>
> If your worried about the performance of the parser, just setting the
> attributeList to null would be faster than doing the
> AttributeListImpl::clear() which would cause a removeAllElement() on
> each of the underlying member vectors.  If your cranking away with the
> parser, chances are the low priority gc task wouldn't be fired while
> your doing this, unless you hit your memory limit.
>
> If your worried about memory space,  the clear() and resulting
> resulting removeAllAttributes() would allow you to reuse the
> AttributeListImpl and Vector objects, but the removeAllElements just
> releases their hold on the underlying String within the vectors,
> meaning that the Strings, which I assume would count for most of the
> memory would be left hanging around for the gc to free anyway.
>
> So which of these approaches would result in a more optimized parser
> would highly depend on the size of the document, the amount of memory
> you have available, and the gc algorithm your VM uses.

Simply put, the number one killer in XML parsing as well as application use of
XML data is creation of temporary handler objects.  In the native interface of
the parser we have, we have utility routines for our CharacterData interface (as
well as AttributeList interface) for parsing raw booleans, integers, base64
content mainly because the java.lang.Integer utility routines only accept Strings
)-:

For some applications which are performance sensitive, creating the temporary
String object needed to call Integer.parseInt(String s) can really bog things
down.

I feel and many others seem to feel that XML Parsers are in the same league as
I/O libraries in terms of their need to be optimized as best as possible,
especially for server environments.  That includes making the parser itself fast
in tokenizing the content as well as making the handling of the parsed content to
the application as fast as possible.  From the application developers perspective
what is important in a component like an XML Parser is that first the component
is fast, and second that the code you have to write to use the component is not
slow, something unfortunately many tools vendors often neglect.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS