OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Handling very large instance docs

[ Lists Home | Date Index | Thread Index ]

On Thursday 29 April 2004 3:28 pm, you wrote:
> > >At the very least I need to be able to sequentially process a large
> > >document and extract an identified sub-tree (ideally denoted by an
> > >XPath expression) for run-of-the-mill tools to manipulate. I assume
> > >such a beast would need to be based on a SAX parser.
> >
> > I did exactly that in Python.  I considered building an engine that
> > could filter SAX events to those that match a limited version of
> > XPath, but ran out of gas.  I ended up with a just regular SAX
> > application.
> Interesting - I always thought such a thing is useful, but haven't
> come across implementation.

The main problem is obviously getting a good range of expression types to 
evaluate correctly and at high performance, its a hard problem. A good 
starting point for reseach in this area is http://xmltk.sourceforge.net/. 
This software there is somewhat behind in functional terms but as a free and 
easy solution to performing large document manipulation its good value.

At the 200-300Mb level I would not rule out a XSLT as a solution although you 
would have to set up your environment carfully, in particularly available 
memory and which XSLT processor.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS