OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] DOM or SAX: Sense and Sensibility

On Wed, 2001-11-07 at 18:23, Bullard, Claude L (Len) wrote:
> A topic for the usual Friday introspection two 
> days early:
> How often do you as experienced XML developers 
> find people in your shop using DOM for work 
> more appropriate to SAX?   Have you asked 
> them why and what do they say?  What are the 
> costs of picking the wrong API?

I guess I am in the vast majority of programers that find DOM-type
(tree-oriented) processing much easier to grasp than SAX processing.It
feels much easier to "be in control" of the document and to act on it
than to let it drive my code.

That said, no matter how cheap memory is, every time I have written
production code that loaded the whole document in memory, I have come
accross a couple of document that just wouldn't fit in. I always ended
up either splitting these documents in smaller chunks and adapting my
code, or rewriting the code in an event-oriented way. This was usually a
real painful and stressful task, as those huge documents are usually
pretty important for the user/customer! 

Hence I wrote the XML::Twig ( http://www.xmltwig.com/ ) Perl module,
that let you register handlers on elements (actually you can use a
subset of XPath to determine when to trigger a handler). This way you
can deal with the document chunk by chunk. Once a chunk is completely
processed it can be flushed out, freeing the memory for the next one.
This is usally appropriate, a maximum of 2 passes have always allowed me
to perform all the transformations I needed.

The only difficult thing with this model is that it is sometimes too
convenient ;--( : it is very  easy to define handlers at several level
in the tree, and  a handler can be called only once an element has been
_completely_ parsed (i.e. when its end tag is parsed). So handlers on
inner tags are called before handlers on outter tags, and by the time
the outter handler is triggered you have to remember that the content of
the sub-element might have been already processed. But I find it a small
price to pay for the convenience of being able to process documents of
any kind using a tree model.

As for DOM vs SAX I think a lot of programers use the DOM because that's
what they find on the W3C site, and frankly I am pretty scared at the
idea of inexperienced coders using it!

Michel Rodriguez
Perl & XML