OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   XML parsing memory overhead concerns

[ Lists Home | Date Index | Thread Index ]
  • From: Paul Miller <stele@fxtech.com>
  • To: xml-dev <xml-dev@ic.ac.uk>
  • Date: Thu, 16 Dec 1999 18:01:03 -0500

As I've written before I've been working on a callback-based streaming
XML parser that is sort of DOM-like, specifically for reading
application data from XML files where you know what the object hierarchy
is. At first I tried layering my work over expat to no avail, since
expat uses a push model and I needed a pull model (so a subelement could
parse its subtree by itself).

Now that I am almost finished with the first cut and about to release
it, I was explaining the solution to a colleague who said I should have
tried to build it on expat anyway. The only way I could have done that
and keep the sub-element parsing model that I want is to have expat
parse entire document into one big internal memory buffer. One of the
advantages of my solution is you only need a small file buffer, since
it's streaming. If I have a data file with 100,000 elements in it, I
would need to store the entire file in memory, along with a couple of
extra megabytes of housekeeping data. My colleague said "so what?".

So, here is my plea for feedback about memory usage concerns. My current
solution works as designed and streams into a small buffer, but it only
supports ASCII and doesn't validate, and relies on C-style callback
functions which can require slightly more code. If I wasn't worried
about the memory I could rewrite my design on top of expat (gaining all
of its benefits, and presumably validation in the future), and provide
an optional DOM-like interface (without all the extra DOM mumbo-jumbo).
It would be possible to combine the streaming and the parsing and throw
away the housekeeping data when the elements are no longer needed (such
as when we've moved on to the next subelement tree).

What do people think? Spare the memory and provide a simpler (and
slightly less capable) solution or store the entire thing in memory and
use the nice stuff in expat and give more features?

--
Paul Miller - stele@fxtech.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS