OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SaxXPathFragmentFilter - Reduse large DOM trees using a SAX XPath cutter!

I have made an experimental SAX XMLFilter. It allows you to "filter" out 
the information in an xml document that you want to work with  - using 
xpath - and skip the rest. You can place the filter anywhere in your 
application where a XMLFilter can be used.

- I don't know if this has already been done by others?

The whole idea is to "filter" out the fragments from the xml document 
that you specifies using an xpath expression. ex. 
SaxXPathFragmentFilter(saxparser, "/cellphone/*/model[@id='1234']", 
"result").  Build a dom tree from the result, or why not feed the sax 
event into a xslt transformer and do some xslt transformations.

The big win is that you don't have to build a large dom tree, if you 
only needs part of the information in a large xml document. You just 
specify what fragments you want using xpath and the result will be a 
much smaller dom tree, witch requires less processing, memory etc.

Let us say that you have a large document with spare parts to Volvo 
vehicles. You want to do a list of engine parts for the S80 car model. 
What you do is specify the xpath (locationpath) that you want to cut out 
from the document ex. "/catalog/cars/s70/parts/engine".

           // your sax parser here
           XMLReader parser =
           // Get instances of your handlers
           SAXHandler jdomsaxhandler = new SAXHandler();

           String xpath = "/catalog/cars/s70/parts/engine";
           String rootName = "s70engineparts"; // this will be the new 
           // set SaxXPathFragmentFilter
           SaxXPathFragmentFilter xpathfilter =
                     new SaxXPathFragmentFilter(parser, xpath, 

           // Parse the document
           // get the Document
           Document doc = jdomsaxhandler.getDocument();

This SaxXPathFragmentFilter is pure experimental. It is spaghetti code. 
I just sat down with an idea and started to code, and the code is not 
very pretty. It needs to be rewritten.

The xpath support is very limited for now. Here is the xpath you can do 
today with this filter:
      "/a/b" - An absolute path.
      "/a/*/c" - An absolute  path but where element no 2 "*" could be 
any element.
      "/a/*/c[@att='value']" - If element c has an attribute with 'value'.
      "/a/*/c[contains='value']" - If element c first child node is a 
text node that contains 'value'.
      "/a/*/c[starts-with='value']" - If element c first child node is a 
text node that starts with 'value'.
      "/a/*/c[ends-with='value']" - If element c first child node is a 
text node that ends with 'value'.
      "/a/*/c['value']" - If element c first child node is a text node 
that is 'value'.
      "/a/*/c[is='value']" - As above.

As you can see the xpath options is very limited. But I think that when 
I find a way to implement the "//" pattern, the filter will be even more 

I have problems with building a dom tree from the result using xerces 
and saxon. But with jdom it works great. This needs to be fixed.

You can not rely on that the result is allways correct, so don't use 
this in any application, just use if for expermentation.

You can find the code at: 

My goal with this filter is to keep it realiable, simple, fast and 
clean. If you want to contribute to this project, then you will be 
wellcome. The filter will be realeased under som kind of opensource 
license (if we get that fare!).

Test it an give me some feedback, on what you think.

Regards, Niels Peter Strandberg