[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Question] How to do incremental parsing?
- From: James Strachan <firstname.lastname@example.org>
- To: Tony.Coates@reuters.com, email@example.com
- Date: Wed, 04 Jul 2001 12:18:42 +0100
> On 04/07/2001 01:27:28 "Xu, Mousheng (SEA)" wrote:
> >A problem of all the current XML parsers is that they at least read the
> >whole XML document into the input stream, which can consume a lot of
> >when the XML is big (e.g. 1 GB).
> So, "use SAX or a persistent DOM" for large XML files/streams is what I
I agree with David and Tony that both direct SAX or persistent DOMs can be
One alternative you might find useful is to use a document object model to
parse your large document but do it in a 'pruning mode'. Often massive
documents (e.g. 1GB) are often database generated and can contain many
'rows' (document fragments) which can be processed individually without
requiring the entire document in memory at once. e.g.
For example the dom4j project has an event based call back mechanism, like
SAX, which can be used to process 'rows' of a massive document in a row by
row fashion which can then be pruned from the tree when finished with and
then garbage collected.
The neat thing about this is you are called back with a complete valid
Document object that only contains one row (<product>) at a time and you can
still use dom4j's XPath support on all aspects of the Document as well as
There's an example in the FAQ here:-
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com