OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] SAX and parallel processing

[ Lists Home | Date Index | Thread Index ]

* David Megginson <david.megginson@gmail.com> [2004-12-31 13:15]:
> On Fri, 31 Dec 2004 11:57:44 -0500, Alan Gutierrez
> <alan-xml-dev@engrm.com> wrote:
> >     It tells me that SAX, as an API, is missing support for parallel
> >     processing. Parellel porcessing is possible in SAX, or rather,
> >     it is possible with ContentHandler implementations that were
> >     co-operative and synchorized, but then you're forgetting an
> >     interface to specify ContentHandlers, and a bunch of other whatnot.
> I think that there's been a terrible amount of confusion on this thread.
> As soon as the filter pipeline design pattern started to become
> popular for applications based on SAX1, I assumed that people would
> insert tee-joints in the pipeline to allow for parallel processing
> when they needed it, especially since thread management in Java is so
> easy.  Note that we're talking about parallel threads performing
> different operations on the *same* sequential event stream (i.e. one
> thread might be populating a database, while another is producing an
> HTML page).

> I can also conceive of applications where different threads deal
> with different parts of the document, as long as the source event
> stream stays single-thread and sequential -- for example, a filter
> might divert a series of events representing a document subtree to
> a separate thread that builds a data structure and performs
> time-consuming operations while the rest of the event stream
> continues on to the original thread (which was briefly suspended
> waiting for more input).

> So the source of the SAX event stream has to be sequential, but
> there's no reason that the rest of the filter pipeline cannot be
> parallelized.

    In my SAX Strategy library, the direction has gone toward making
    the Strategies stateless, and placing information in a stack
    that is maintianed by a SAX ContentHandler/LexicalHandler. This
    lends itself to parallel processing, and I figured it was only a
    matter of time before I was preforming multiplexing, updating an
    Oracle database in one thread, an LDAP directory in another.

    I've yet to sort out how to bring the stream of events back together.

    From the viewpoint of SAX, I see XML less as a document, more as
    series of events, so I've chosen to, more or less, dispose of
    the startDocument and endDocument events in my library. There
    are their, but they are second-class citizens, and logic
    generally begins with the first element event.

    Indeed, do parallel processing of the sort your describing. I've
    often multiplexed a SAX event stream to do two things at once,
    but withing the same thread.

    For some reason, perhaps misguided, I see a lot of potential for
    SAX and streaming XML.

Alan Gutierrez - alan@engrm.com


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS