OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Question] How to do incremental parsing?

One could say that whenever you're manipulating 1GB XML documents, be it
using SAX or DOM, you're in trouble... What you need is a XML database, or
no XML at all (e.g. a relational database).


>-----Message d'origine-----
>De : Tony.Coates@reuters.com [mailto:Tony.Coates@reuters.com]
>Envoye : mercredi 4 juillet 2001 12:22
>A : xml-dev@lists.xml.org
>Objet : Re: [Question] How to do incremental parsing?
>On 04/07/2001 01:27:28 "Xu, Mousheng  (SEA)" wrote:
>>A problem of all the current XML parsers is that they at 
>least read the
>>whole XML document into the input stream, which can consume a 
>lot of memory
>>when the XML is big (e.g. 1 GB).
>You will generally be told "use SAX not DOM" for large 
>files/streams.  That's OK if your application can deal with 
>the data in your XML in a localised fashion.  And, it has to 
>be said, designing your XML formats to work within the 
>constraints of SAX can be a good exercise in avoiding 
>structures that require backtracking through the document when 
>they are processed.
>Still, it often is necessary to backtrack, or make connections 
>between parts of a document that may be widely separated in 
>the file/stream.  In this case, you want to be able to use 
>something more like DOM, because the SAX alternative here 
>would require you to build a store of the information that has 
>been parsed, and that means (a) writing more code than you 
>might like to, and (b) possibly storing as much information as 
>a DOM tree would anyway.  What does seem to be a useful way 
>forward for these kinds of problems are persistent DOMs built 
>into databases, such as have been appearing recently.  The DOM 
>tree is then paged into memory as required.  Of course, this 
>is slower than holding the whole DOM tree in memory, but the 
>fact is that databases are fast enough to do real stuff with, 
>and if that is true for relational tables, it should hold true 
>for persistent DOMs.
>So, "use SAX or a persistent DOM" for large XML files/streams 
>is what I would suggest.
>     Cheers,
>          Tony.
>Anthony B. Coates
>Leader of XML Architecture & Design
>Chief Technology Office
>Reuters Plc, London.
>        Visit our Internet site at http://www.reuters.com
>Any views expressed in this message are those of  the  individual
>sender,  except  where  the sender specifically states them to be
>the views of Reuters Ltd.
>The xml-dev list is sponsored by XML.org, an initiative of OASIS
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To unsubscribe from this elist send a message with the single word
>"unsubscribe" in the body to: xml-dev-request@lists.xml.org