[
Lists Home |
Date Index |
Thread Index
]
I am idly wondering whether unpooled steaming Java APIs of XML documents (e.g. SAX)
really make as much sense as we might like them to.
It strikes me that there are two factors that undermine the benefits of streaming processing:
* XML documents are rarely smaller than memory
* Java implementations typically only garbage collect when they get "near"
to filling their heaps.
These two things conspire to make it that, for the lion's share of documents,
by the time the SAX stream is finished, all the SAX events will be still
in memory, though perhaps unreachable. If they are in memory, why not
make them available?
That being the case, it seems that simple streaming such as SAX provides
don't make sense. They would be better to either
* have the SAX stream kept cached for the lifetime of the document
(or have some kind of weak reference perhaps) since they are in memory
anyway (though unreachable), allowing backward-looking XPaths; or
* requiring SAX clients return events to a pool (which would reduce
memory use).
Does that sound right to anyone?
Cheers
Rick Jelliffe
|