[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
SaxXPathFragmentFilter - Reduse large DOM trees using a SAX XPath cutter!
I have made an experimental SAX XMLFilter. It allows you to "filter" out
the information in an xml document that you want to work with - using
xpath - and skip the rest. You can place the filter anywhere in your
application where a XMLFilter can be used.
- I don't know if this has already been done by others?
The whole idea is to "filter" out the fragments from the xml document
that you specifies using an xpath expression. ex.
SaxXPathFragmentFilter(saxparser, "/cellphone/*/model[@id='1234']",
"result"). Build a dom tree from the result, or why not feed the sax
event into a xslt transformer and do some xslt transformations.
The big win is that you don't have to build a large dom tree, if you
only needs part of the information in a large xml document. You just
specify what fragments you want using xpath and the result will be a
much smaller dom tree, witch requires less processing, memory etc.
Let us say that you have a large document with spare parts to Volvo
vehicles. You want to do a list of engine parts for the S80 car model.
What you do is specify the xpath (locationpath) that you want to cut out
from the document ex. "/catalog/cars/s70/parts/engine".
// your sax parser here
XMLReader parser =
XMLReaderFactory.createXMLReader(
"org.apache.xerces.parsers.SAXParser");
// Get instances of your handlers
SAXHandler jdomsaxhandler = new SAXHandler();
String xpath = "/catalog/cars/s70/parts/engine";
String rootName = "s70engineparts"; // this will be the new
root.
// set SaxXPathFragmentFilter
SaxXPathFragmentFilter xpathfilter =
new SaxXPathFragmentFilter(parser, xpath,
resultrootname);
xpathfilter.setContentHandler(jdomsaxhandler);
// Parse the document
xpathfilter.parse(uri);
// get the Document
Document doc = jdomsaxhandler.getDocument();
This SaxXPathFragmentFilter is pure experimental. It is spaghetti code.
I just sat down with an idea and started to code, and the code is not
very pretty. It needs to be rewritten.
The xpath support is very limited for now. Here is the xpath you can do
today with this filter:
"/a/b" - An absolute path.
"/a/*/c" - An absolute path but where element no 2 "*" could be
any element.
"/a/*/c[@att='value']" - If element c has an attribute with 'value'.
"/a/*/c[contains='value']" - If element c first child node is a
text node that contains 'value'.
"/a/*/c[starts-with='value']" - If element c first child node is a
text node that starts with 'value'.
"/a/*/c[ends-with='value']" - If element c first child node is a
text node that ends with 'value'.
"/a/*/c['value']" - If element c first child node is a text node
that is 'value'.
"/a/*/c[is='value']" - As above.
As you can see the xpath options is very limited. But I think that when
I find a way to implement the "//" pattern, the filter will be even more
powerful.
I have problems with building a dom tree from the result using xerces
and saxon. But with jdom it works great. This needs to be fixed.
You can not rely on that the result is allways correct, so don't use
this in any application, just use if for expermentation.
You can find the code at:
http://www.npstrandberg.com/projects/saxxpathfragmentfilter/saxxpathfragmentfilter.
tar.gz
My goal with this filter is to keep it realiable, simple, fast and
clean. If you want to contribute to this project, then you will be
wellcome. The filter will be realeased under som kind of opensource
license (if we get that fare!).
Test it an give me some feedback, on what you think.
Regards, Niels Peter Strandberg