Lists Home |
Date Index |
I totally agree Michael. I would like to see a mixed model which allows
XQuery filtering of events and similar behavior. Some layers based on
SAX filters seem to come close to this goal.
> Rick Jelliffe wrote:
> > I am idly wondering whether unpooled steaming Java APIs of XML
> (e.g. SAX)
> > really make as much sense as we might like them to.
> I've been wondering why we have such absolute either-or choices in
> available APIs. Why not hybridized APIs that provide event streams,
> let you collect arbitrary spans of content into an object model that
> be more easily manipulated and accessed without needing a complete
> in-memory tree model of the entire document?
> I think both SAX and tree APIs are unweildy to work with. I'm more
> interested in rule-based and pattern-based approaches, but prefer not
> have to build a complete in-memory model of the entire document to
> enable such an approach.
> > It strikes me that there are two factors that undermine the benefits
> streaming processing:
> > * XML documents are rarely smaller than memory
> > * Java implementations typically only garbage collect when they get
> > to filling their heaps.
> > These two things conspire to make it that, for the lion's share of
> > by the time the SAX stream is finished, all the SAX events will be
> > in memory, though perhaps unreachable. If they are in memory, why
> > make them available?
> > That being the case, it seems that simple streaming such as SAX
> > don't make sense. They would be better to either
> > * have the SAX stream kept cached for the lifetime of the document
> > (or have some kind of weak reference perhaps) since they are in
> > anyway (though unreachable), allowing backward-looking XPaths; or
> Pooling objects using weak references incurs a small performance
> (I've experimented a bit with such approaches, though not for SAX
> events). In the context of a real-world application this penalty is
> likely to be pretty minimal. Nonetheless, if someone is using SAX, it
> may be becaused they are trying to maximize performance.
> > * requiring SAX clients return events to a pool (which would reduce
> > memory use).
> > Does that sound right to anyone?
> The approach I'm experimenting with, right now, in my swan toolkit
> (http://swan.sourceforge.net) is maintaining a stack to support
> backward-looking XPaths and XSLT pattern-matching, melded with rules
> that can gather content into suitable data structures for relevant
> portions of a document. As part of that, I have a prefab rule one can
> use to gather up a fragment into a minimalistic tree API that supports
> XPath queries. This could easily be adapted to use a full-fledged tree
> API for the fragment, but I was more interested in using XPath
> expressions than navigating unweildy tree APIs.
> This is still all in a rough state. I haven't done a file release of
> this code, yet, and some key portions are not in CVS, yet (due to some
> problems I've been having with CVS integration with Eclipse). I've
> been letting this languish the last few weeks, but am starting to get
> back into it this weekend. I've been approaching this in a rather lazy
> fashion (my motivation has been admittedly low), but I hope to have an
> alpha release of something soon.
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>