OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Question] How to do incremental parsing?



Hiya.

Or the third alternative is don't use XML to actualy work with your data.
Sometimes XML is the perfect fit to work with, sometimes its not.

XML is very good for data interchange, but often for large dataset
developers can bend themselves out of shape to accomodate XML, when simply
resorting to a traditional RDBMS would be a lot quick. Even in such
applications XML still has a pivitol roll for reading into, and writing out
of the aplpication (serialising data), but a big question mark has to be
drawn over the DOM with regard to its suitability as an interface for large
datasets.

The peristent DOM is an interesting idea, and one I've thought about using
in the past (actualy thought of implementing a lighteright mini-DOM over
MySQL), but I'm not convinced it's got much of a long term future... it
might do, and I could well be wrong, I have no strong assertions to make,
simply that oen might hope that XML Querying will make XML (aware at least)
Databases attractive.

So reflecting Tony's advice, use SAX, and if SAX isn't cutting it, then
maybe consider reading into a RDMS and using SQL =)

Oh and anybody with decent experience of persistent DOM, I'd appreciate the
feedback.

Cheers
    Guy.

----- Original Message -----
From: <Tony.Coates@reuters.com>
To: <xml-dev@lists.xml.org>
Sent: Wednesday, July 04, 2001 11:21 AM
Subject: Re: [Question] How to do incremental parsing?


>
> On 04/07/2001 01:27:28 "Xu, Mousheng  (SEA)" wrote:
>
> >A problem of all the current XML parsers is that they at least read the
> >whole XML document into the input stream, which can consume a lot of
memory
> >when the XML is big (e.g. 1 GB).
>
> You will generally be told "use SAX not DOM" for large files/streams.
That's OK if your application can deal with the data in your XML in a
localised fashion.  And, it has to be said, designing your XML formats to
work within the constraints of SAX can be a good exercise in avoiding
structures that require backtracking through the document when they are proc
essed.
>
> Still, it often is necessary to backtrack, or make connections between
parts of a document that may be widely separated in the file/stream.  In
this case, you want to be able to use something more like DOM, because the
SAX alternative here would require you to build a store of the information
that has been parsed, and that means (a) writing more code than you might
like to, and (b) possibly storing as much information as a DOM tree would
anyway.  What does seem to be a useful way forward for these kinds of
problems are persistent DOMs built into databases, such as have been
appearing recently.  The DOM tree is then paged into memory as required.  Of
course, this is slower than holding the whole DOM tree in memory, but the
fact is that databases are fast enough to do real stuff with, and if that is
true for relational tables, it should hold true for persistent DOMs.
>
> So, "use SAX or a persistent DOM" for large XML files/streams is what I
would suggest.
>
>      Cheers,
>           Tony.
[SNIP]