OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: XML tools and big documents

[ Lists Home | Date Index | Thread Index ]
  • From: Tyler Baker <tyler@infinet.com>
  • To: David Megginson <david@megginson.com>
  • Date: Thu, 03 Sep 1998 11:25:59 -0400

David Megginson wrote:

> Don Park writes:
>
>  > > As for the memory issue, I have thought about some sort of LZW
>  > > compression of all of the text in a document tree.  This would
>  > > save a lot of memory, but may slow down building the DOM tree a
>  > > bit.  Any ideas on this?
>  >
>  >
>  > Your performance will suffer and memory problem still remains.
>
> Agreed.  The overhead comes from the node objects, not from the text.
> The biggest hogs can be attributes, especially in the standard SGML
> DTDs which often include dozens of defaulted attributes for each
> document type.  If you can optimise those (allocating nodes only on
> demand and then freeing them as soon as they're not needed), you're
> half-way there.
>
> The second biggest hogs are leaf elements which contain only text.  If
> you can treat those as special cases and allocate only one object for
> each one instead of three (element node, node list, text node), then
> you're another quarter of the way there.

Very true.  However, in Java at least you can get around allocating a new object
for the node list by having your Node implementation also implement the NodeList
implementation as well.  Only allocate a buffer to store the children as needed.
You can do the same thing with the Element Node with regard to attributes.  This
saves a lot of memory and heap-based object allocation that you would have to do
otherwise.  Nevertheless, in Java allocating raw Objects is a memory hog to begin
with.

> PIs , doctype declarations, notations, etc. are rare enough that you
> don't gain much by optimising them.  Your mileage on comments, entity
> references and CDATA sections may vary, but you're probably best
> skipping them or replacing them with their contents when you build the
> tree, unless your application has very specialised requirements.

This is very true.  For large documents both heavily document oriented or
transaction oriented I still think that compressing all of the text in the
document tree may have some promise.  I guess before spending any more time
talking about it, I should spend the necessary hours to just do it.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS