OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] SAX and parallel processing

[ Lists Home | Date Index | Thread Index ]

Right. In order to process a SAX stream in parallel you have to copy the 
data in the stream, you can't just "forward" the events. You also have 
to instantiate a context for each event, including at least the 
namespaces in scope, the Location info. I didn't mean to imply this 
would be excessively expensive, just not as lightweight as serially 
processed SAX.

Bob

Alan Gutierrez wrote:
 > * Bob Foster <bob@objfac.com> [2004-12-31 18:03]:
 >>I have a question, though. What is the guaranteed lifetime of an object
 >>appearing in a SAX event, like an Attributes object, and any objects
 >>used to implement it? If, for example, Attributes were implemented as a
 >>collection of lightweight Attribute objects that were re-used for
 >>subsequent events, the event data could not be passed directly to
 >>parallel threads without copying it. (Or by joining at the end of every
 >>event, which would rather limit the parallelism.)
 >
 >
 >
 >     Xerces recycles Attributes structures for each call to
 >     startElement.
 >
 >     In my library, I keep a stack of attribute structures. The
 >     attribute structures on the stack are recycled for each element
 >     depth, not actually popped and reallocated.
 >
 >     I copy over the values in SAX Attributes to an attributes
 >     structure on this stack, but SAX Attributes are all Strings and
 >     in Java Strings are immutable, so this is really a bunch of
 >     pointer assignments (and the adjustment of an array length 
parameter).
 >
 >     Not too expensive to keep that stack around.
 >
 >         (Because of this, I've come to see streaming problems as SAX
 >             connected stacks of elements. If I need to transform a
 >             document, I chain SAX Strategy Handlers. This, rather
 >             than allow a Strategy to fiddle with its stack within
 >             the handler.)
 >
 >     The characters event is interesting, becuase it is an index into
 >     the parse buffer (in theory, and on Xerces indeed), but a
 >     characters evet is only ever at the top of the stack. I only
 >     ever need one.
 >
 >     In SAX Strategy, all of the lexemes in the events have a
 >     getImmutable() method that will return an immutable copy (or
 >     return itself it it is immutable) for when a series of events
 >     needs to be recoreded.
 >
 >         (Not yet implemented, but if one was buffering and releasing
 >             nodes, they could use the mutable lexemes and events to
 >             implement a cache.)
 >
 >     I need to look harder, but I suspect that the handful of
 >     workhorse SAX ContentHandlers I use, that I get from outside my
 >     library, are probably self contained. Things like DOM4J's
 >     ContentBuilder, and the SAXTransformers of Saxon, via TRAX.
 >
 > --
 > Alan Gutierrez - alan@engrm.com





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS