OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using aSAX XPath cutter!

I really like what you've done, but the language you're
using is not XPath ( neither it is a subset of XPath )
and I see a problem here ( I think I also have some
kind of solution to that problem and I'l express it
in my next letter )


----- Original Message -----
From: "Niels Peter Strandberg" <nielspeter@npstrandberg.com>
To: <xml-dev@lists.xml.org>
Sent: Wednesday, November 28, 2001 5:40 AM
Subject: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using a
SAX XPath cutter!

> I have made an experimental SAX XMLFilter. It allows you to "filter" out
> the information in an xml document that you want to work with  - using
> xpath - and skip the rest. You can place the filter anywhere in your
> application where a XMLFilter can be used.
> - I don't know if this has already been done by others?
> The whole idea is to "filter" out the fragments from the xml document
> that you specifies using an xpath expression. ex.
> SaxXPathFragmentFilter(saxparser, "/cellphone/*/model[@id='1234']",
> "result").  Build a dom tree from the result, or why not feed the sax
> event into a xslt transformer and do some xslt transformations.
> The big win is that you don't have to build a large dom tree, if you
> only needs part of the information in a large xml document. You just
> specify what fragments you want using xpath and the result will be a
> much smaller dom tree, witch requires less processing, memory etc.
> Let us say that you have a large document with spare parts to Volvo
> vehicles. You want to do a list of engine parts for the S80 car model.
> What you do is specify the xpath (locationpath) that you want to cut out
> from the document ex. "/catalog/cars/s70/parts/engine".
>            // your sax parser here
>            XMLReader parser =
>                      XMLReaderFactory.createXMLReader(
>                                "org.apache.xerces.parsers.SAXParser");
>            // Get instances of your handlers
>            SAXHandler jdomsaxhandler = new SAXHandler();
>            String xpath = "/catalog/cars/s70/parts/engine";
>            String rootName = "s70engineparts"; // this will be the new
> root.
>            // set SaxXPathFragmentFilter
>            SaxXPathFragmentFilter xpathfilter =
>                      new SaxXPathFragmentFilter(parser, xpath,
> resultrootname);
>            xpathfilter.setContentHandler(jdomsaxhandler);
>            // Parse the document
>            xpathfilter.parse(uri);
>            // get the Document
>            Document doc = jdomsaxhandler.getDocument();
> This SaxXPathFragmentFilter is pure experimental. It is spaghetti code.
> I just sat down with an idea and started to code, and the code is not
> very pretty. It needs to be rewritten.
> The xpath support is very limited for now. Here is the xpath you can do
> today with this filter:
>       "/a/b" - An absolute path.
>       "/a/*/c" - An absolute  path but where element no 2 "*" could be
> any element.
>       "/a/*/c[@att='value']" - If element c has an attribute with 'value'.
>       "/a/*/c[contains='value']" - If element c first child node is a
> text node that contains 'value'.
>       "/a/*/c[starts-with='value']" - If element c first child node is a
> text node that starts with 'value'.
>       "/a/*/c[ends-with='value']" - If element c first child node is a
> text node that ends with 'value'.
>       "/a/*/c['value']" - If element c first child node is a text node
> that is 'value'.
>       "/a/*/c[is='value']" - As above.
> As you can see the xpath options is very limited. But I think that when
> I find a way to implement the "//" pattern, the filter will be even more
> powerful.
> I have problems with building a dom tree from the result using xerces
> and saxon. But with jdom it works great. This needs to be fixed.
> You can not rely on that the result is allways correct, so don't use
> this in any application, just use if for expermentation.
> You can find the code at:
> tar.gz
> My goal with this filter is to keep it realiable, simple, fast and
> clean. If you want to contribute to this project, then you will be
> wellcome. The filter will be realeased under som kind of opensource
> license (if we get that fare!).
> Test it an give me some feedback, on what you think.
> Regards, Niels Peter Strandberg
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>