[
Lists Home |
Date Index |
Thread Index
]
Right. In order to process a SAX stream in parallel you have to copy the
data in the stream, you can't just "forward" the events. You also have
to instantiate a context for each event, including at least the
namespaces in scope, the Location info. I didn't mean to imply this
would be excessively expensive, just not as lightweight as serially
processed SAX.
Bob
Alan Gutierrez wrote:
> * Bob Foster <bob@objfac.com> [2004-12-31 18:03]:
>>I have a question, though. What is the guaranteed lifetime of an object
>>appearing in a SAX event, like an Attributes object, and any objects
>>used to implement it? If, for example, Attributes were implemented as a
>>collection of lightweight Attribute objects that were re-used for
>>subsequent events, the event data could not be passed directly to
>>parallel threads without copying it. (Or by joining at the end of every
>>event, which would rather limit the parallelism.)
>
>
>
> Xerces recycles Attributes structures for each call to
> startElement.
>
> In my library, I keep a stack of attribute structures. The
> attribute structures on the stack are recycled for each element
> depth, not actually popped and reallocated.
>
> I copy over the values in SAX Attributes to an attributes
> structure on this stack, but SAX Attributes are all Strings and
> in Java Strings are immutable, so this is really a bunch of
> pointer assignments (and the adjustment of an array length
parameter).
>
> Not too expensive to keep that stack around.
>
> (Because of this, I've come to see streaming problems as SAX
> connected stacks of elements. If I need to transform a
> document, I chain SAX Strategy Handlers. This, rather
> than allow a Strategy to fiddle with its stack within
> the handler.)
>
> The characters event is interesting, becuase it is an index into
> the parse buffer (in theory, and on Xerces indeed), but a
> characters evet is only ever at the top of the stack. I only
> ever need one.
>
> In SAX Strategy, all of the lexemes in the events have a
> getImmutable() method that will return an immutable copy (or
> return itself it it is immutable) for when a series of events
> needs to be recoreded.
>
> (Not yet implemented, but if one was buffering and releasing
> nodes, they could use the mutable lexemes and events to
> implement a cache.)
>
> I need to look harder, but I suspect that the handful of
> workhorse SAX ContentHandlers I use, that I get from outside my
> library, are probably self contained. Things like DOM4J's
> ContentBuilder, and the SAXTransformers of Saxon, via TRAX.
>
> --
> Alan Gutierrez - alan@engrm.com
|