OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using a SAX XPath cutter!




<plug type="blatant">

	And if you do want to have actual XPath-esque support, I heartily
	suggest taking a look at SAXPath to do your xpath parsing for you.

	http://saxpath.org/

</plug>

	-bob



On Wed, 28 Nov 2001, PaulT wrote:

> 
> I really like what you've done, but the language you're
> using is not XPath ( neither it is a subset of XPath )
> and I see a problem here ( I think I also have some
> kind of solution to that problem and I'l express it
> in my next letter )
> 
> Rgds.Paul.
> 
> ----- Original Message -----
> From: "Niels Peter Strandberg" <nielspeter@npstrandberg.com>
> To: <xml-dev@lists.xml.org>
> Sent: Wednesday, November 28, 2001 5:40 AM
> Subject: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using a
> SAX XPath cutter!
> 
> 
> > I have made an experimental SAX XMLFilter. It allows you to "filter" out
> > the information in an xml document that you want to work with  - using
> > xpath - and skip the rest. You can place the filter anywhere in your
> > application where a XMLFilter can be used.
> >
> > - I don't know if this has already been done by others?
> >
> > The whole idea is to "filter" out the fragments from the xml document
> > that you specifies using an xpath expression. ex.
> > SaxXPathFragmentFilter(saxparser, "/cellphone/*/model[@id='1234']",
> > "result").  Build a dom tree from the result, or why not feed the sax
> > event into a xslt transformer and do some xslt transformations.
> >
> > The big win is that you don't have to build a large dom tree, if you
> > only needs part of the information in a large xml document. You just
> > specify what fragments you want using xpath and the result will be a
> > much smaller dom tree, witch requires less processing, memory etc.
> >
> > Let us say that you have a large document with spare parts to Volvo
> > vehicles. You want to do a list of engine parts for the S80 car model.
> > What you do is specify the xpath (locationpath) that you want to cut out
> > from the document ex. "/catalog/cars/s70/parts/engine".
> >
> >            // your sax parser here
> >            XMLReader parser =
> >                      XMLReaderFactory.createXMLReader(
> >                                "org.apache.xerces.parsers.SAXParser");
> >
> >            // Get instances of your handlers
> >            SAXHandler jdomsaxhandler = new SAXHandler();
> >
> >            String xpath = "/catalog/cars/s70/parts/engine";
> >            String rootName = "s70engineparts"; // this will be the new
> > root.
> >
> >            // set SaxXPathFragmentFilter
> >            SaxXPathFragmentFilter xpathfilter =
> >                      new SaxXPathFragmentFilter(parser, xpath,
> > resultrootname);
> >            xpathfilter.setContentHandler(jdomsaxhandler);
> >
> >            // Parse the document
> >            xpathfilter.parse(uri);
> >
> >            // get the Document
> >            Document doc = jdomsaxhandler.getDocument();
> >
> >
> > This SaxXPathFragmentFilter is pure experimental. It is spaghetti code.
> > I just sat down with an idea and started to code, and the code is not
> > very pretty. It needs to be rewritten.
> >
> >
> > The xpath support is very limited for now. Here is the xpath you can do
> > today with this filter:
> >       "/a/b" - An absolute path.
> >       "/a/*/c" - An absolute  path but where element no 2 "*" could be
> > any element.
> >       "/a/*/c[@att='value']" - If element c has an attribute with 'value'.
> >       "/a/*/c[contains='value']" - If element c first child node is a
> > text node that contains 'value'.
> >       "/a/*/c[starts-with='value']" - If element c first child node is a
> > text node that starts with 'value'.
> >       "/a/*/c[ends-with='value']" - If element c first child node is a
> > text node that ends with 'value'.
> >       "/a/*/c['value']" - If element c first child node is a text node
> > that is 'value'.
> >       "/a/*/c[is='value']" - As above.
> >
> > As you can see the xpath options is very limited. But I think that when
> > I find a way to implement the "//" pattern, the filter will be even more
> > powerful.
> >
> > I have problems with building a dom tree from the result using xerces
> > and saxon. But with jdom it works great. This needs to be fixed.
> >
> > You can not rely on that the result is allways correct, so don't use
> > this in any application, just use if for expermentation.
> >
> > You can find the code at:
> >
> http://www.npstrandberg.com/projects/saxxpathfragmentfilter/saxxpathfragment
> filter.
> > tar.gz
> >
> > My goal with this filter is to keep it realiable, simple, fast and
> > clean. If you want to contribute to this project, then you will be
> > wellcome. The filter will be realeased under som kind of opensource
> > license (if we get that fare!).
> >
> > Test it an give me some feedback, on what you think.
> >
> >
> > Regards, Niels Peter Strandberg
> >
> >
> > -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> >
> > The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> > To subscribe or unsubscribe from this list use the subscription
> > manager: <http://lists.xml.org/ob/adm.pl>
> >
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>