[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Question] How to do incremental parsing?
- From: Nicolas LEHUEN <nicolas.lehuen@ubicco.com>
- To: "'xml-dev@lists.xml.org'" <xml-dev@lists.xml.org>
- Date: Wed, 04 Jul 2001 12:48:04 +0200
One could say that whenever you're manipulating 1GB XML documents, be it
using SAX or DOM, you're in trouble... What you need is a XML database, or
no XML at all (e.g. a relational database).
Regards,
Nicolas
>-----Message d'origine-----
>De : Tony.Coates@reuters.com [mailto:Tony.Coates@reuters.com]
>Envoye : mercredi 4 juillet 2001 12:22
>A : xml-dev@lists.xml.org
>Objet : Re: [Question] How to do incremental parsing?
>
>
>
>On 04/07/2001 01:27:28 "Xu, Mousheng (SEA)" wrote:
>
>>A problem of all the current XML parsers is that they at
>least read the
>>whole XML document into the input stream, which can consume a
>lot of memory
>>when the XML is big (e.g. 1 GB).
>
>You will generally be told "use SAX not DOM" for large
>files/streams. That's OK if your application can deal with
>the data in your XML in a localised fashion. And, it has to
>be said, designing your XML formats to work within the
>constraints of SAX can be a good exercise in avoiding
>structures that require backtracking through the document when
>they are processed.
>
>Still, it often is necessary to backtrack, or make connections
>between parts of a document that may be widely separated in
>the file/stream. In this case, you want to be able to use
>something more like DOM, because the SAX alternative here
>would require you to build a store of the information that has
>been parsed, and that means (a) writing more code than you
>might like to, and (b) possibly storing as much information as
>a DOM tree would anyway. What does seem to be a useful way
>forward for these kinds of problems are persistent DOMs built
>into databases, such as have been appearing recently. The DOM
>tree is then paged into memory as required. Of course, this
>is slower than holding the whole DOM tree in memory, but the
>fact is that databases are fast enough to do real stuff with,
>and if that is true for relational tables, it should hold true
>for persistent DOMs.
>
>So, "use SAX or a persistent DOM" for large XML files/streams
>is what I would suggest.
>
> Cheers,
> Tony.
>========
>Anthony B. Coates
>Leader of XML Architecture & Design
>Chief Technology Office
>Reuters Plc, London.
>tony.coates@reuters.com
>========
>
>
>
>-----------------------------------------------------------------
> Visit our Internet site at http://www.reuters.com
>
>Any views expressed in this message are those of the individual
>sender, except where the sender specifically states them to be
>the views of Reuters Ltd.
>
>------------------------------------------------------------------
>The xml-dev list is sponsored by XML.org, an initiative of OASIS
><http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To unsubscribe from this elist send a message with the single word
>"unsubscribe" in the body to: xml-dev-request@lists.xml.org
>