OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Some clarificatiosn -- RE: [Question] How to do incremental parsi ng?

Dear All,

It's amazing to get some many replies when I came to work this morning.
Sorry I cannot make replies individually. Here are some clarifications:

* I was wrong in saying that SAX reads the whole doc in memory. I meant to
say that about DOM lazy evaluation.

* A DOM Java parser is eventually I am looking for. 

The problem of SAX is that you will have to write all those tedious
"startElement", "endElement" stuff every time for each XML file of a
different format, and the parsing never stops! Perl modules, or another
scripting language like OmniMark is not an option because they are not in
Java. Putting an XML doc into a RDBMS is not an option either, because it is
only an awkward temp solution. Guy Murphy mentioned the possibility of
"don't use XML", but a generic XML parser is what I am looking for,
otherwise it's gonna be a nightmare each time when an large XML file is to
be dealt with. 

Some mentioned the row processing feature of dom4j, kXML, SAXON, minidom,
easydom, and Orchard. Do they read the whole doc into memory before parsing
anyway, like the DOM lazy eval? If these parsers are based on xerces SAX,
the chances are the whole doc is read into the memory.

An incremental SAX parser such as the suggested MSXML SAX parser seems to be
the closest idea, but an incremental DOM parser has to be built upon it.
Ajay, do you have a quick reference on MSXML?

* What is "persistent DOM"?

Thanks a lot.

-- Mousheng Xu

The information contained in this email is intended for the
personal and confidential use of the addressee only. It may
also be privileged information. If you are not the intended
recipient then you are hereby notified that you have received
this document in error and that any review, distribution or
copying of this document is strictly prohibited. If you have
received  this communication in error, please notify Celltech
Group immediately on:

+44 (0)1753 534655, or email 'is@celltech.co.uk'

Celltech Group plc
216 Bath Road, Slough, SL1 4EN, Berkshire, UK

Registered Office as above. Registered in England No. 2159282