[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using a SAX XPath cutter!

To: pault12@pacbell.net, xml-dev@lists.xml.org
Subject: Re: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using a SAX XPath cutter!
From: Niels Peter Strandberg <nielspeter@npstrandberg.com>
Date: Wed, 28 Nov 2001 17:16:21 +0100
In-reply-to: <007f01c17813$8bdfdb60$2cd0a340@pault>

You are right that it is not XPath, but more a type of a location path. 
But many developers is familiar with XPath, and XPath is in it simplest 
form a location path.  Lets say that you have a XML document, and you 
know what parts you want to cut, then a XPath like syntax is the most 
simple to use, because you might already know basic XPath.

I have a simple filter that will run through the xml document a produce 
a XPath (location path), that you can use to find the place in your 
document you want to do the cutting. That will be available later.

There is a world of advanced tools out there, but no developer can know 
them all. Now if you know SAX and XMLFilters, then this is a simple 
tool, that will help you get the work done. This filter is not a 
replacement of other tools, but a supplement.

The filter don't build a tree, but a uses an ArrayList as a location 
path. When startElement() is called the localname is added to the end of 
the list, and when endElement() is called it is removed. the dynamic 
build list is then compared to the given XPath (location path) given by 
the user. So we are not walking trees but lists.

I think i will be possible to do thinks like 
"/xhtml//a[@href='index.html']" in future version of the filter.

I do believe that some of the features in XPath and XSLT can be 
reproduced using SAX. In this filter I buffer the startElement() and 
check the text node that follows for a text match. If a match is found 
the buffered startElement() is sent.  Using some kind of buffering, it 
will possible to do some basic XSLT transformation.

Sometimes we are so eager to build "can all - want all" tools. I think 
that many of the w3c standards is so complicated, that only a few people 
really know all functions. I like the JDOM way of thinking "keep it 
simple !". You don't have to be a rocket scientist to use it.

We need more feature cutters, like JDOM. Tools that can bring the best 
of DOM, SAX, XSLT, XPath to the masses. We need XPath Light, XSLT Light, 
DOM Light etc.

Take JAXP as a example.  To get a empty w3c.document you do:

           DocumentBuilderFactory dfactory   = 
DocumentBuilderFactory.newInstance();
           DocumentBuilder           docBuilder = 
dfactory.newDocumentBuilder();
           org.w3c.dom.Document             doc = 
docBuilder.newDocument();

           or

           org.w3c.dom.Document doc  = 
DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

Now that is a waste of space! The advantage that Microsoft developers 
has over us Java developers, is that Microsoft know how to simplify the 
life of the developer. The idea behind JAXP should be simplicity, not 
another "many hours of testing and reading".

"Keep it simple!"

regards, Niels Peter


On onsdag, november 28, 2001, at 02:49 , PaulT wrote:

>
> I really like what you've done, but the language you're
> using is not XPath ( neither it is a subset of XPath )
> and I see a problem here ( I think I also have some
> kind of solution to that problem and I'l express it
> in my next letter )
>
> Rgds.Paul.
>
> ----- Original Message -----
> From: "Niels Peter Strandberg" <nielspeter@npstrandberg.com>
> To: <xml-dev@lists.xml.org>
> Sent: Wednesday, November 28, 2001 5:40 AM
> Subject: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees 
> using a
> SAX XPath cutter!
>
>
>> I have made an experimental SAX XMLFilter. It allows you to "filter" 
>> out
>> the information in an xml document that you want to work with  - using
>> xpath - and skip the rest. You can place the filter anywhere in your
>> application where a XMLFilter can be used.
>>
>> - I don't know if this has already been done by others?
>>
>> The whole idea is to "filter" out the fragments from the xml document
>> that you specifies using an xpath expression. ex.
>> SaxXPathFragmentFilter(saxparser, "/cellphone/*/model[@id='1234']",
>> "result").  Build a dom tree from the result, or why not feed the sax
>> event into a xslt transformer and do some xslt transformations.
>>
>> The big win is that you don't have to build a large dom tree, if you
>> only needs part of the information in a large xml document. You just
>> specify what fragments you want using xpath and the result will be a
>> much smaller dom tree, witch requires less processing, memory etc.
>>
>> Let us say that you have a large document with spare parts to Volvo
>> vehicles. You want to do a list of engine parts for the S80 car model.
>> What you do is specify the xpath (locationpath) that you want to cut 
>> out
>> from the document ex. "/catalog/cars/s70/parts/engine".
>>
>>            // your sax parser here
>>            XMLReader parser =
>>                      XMLReaderFactory.createXMLReader(
>>                                "org.apache.xerces.parsers.SAXParser");
>>
>>            // Get instances of your handlers
>>            SAXHandler jdomsaxhandler = new SAXHandler();
>>
>>            String xpath = "/catalog/cars/s70/parts/engine";
>>            String rootName = "s70engineparts"; // this will be the new
>> root.
>>
>>            // set SaxXPathFragmentFilter
>>            SaxXPathFragmentFilter xpathfilter =
>>                      new SaxXPathFragmentFilter(parser, xpath,
>> resultrootname);
>>            xpathfilter.setContentHandler(jdomsaxhandler);
>>
>>            // Parse the document
>>            xpathfilter.parse(uri);
>>
>>            // get the Document
>>            Document doc = jdomsaxhandler.getDocument();
>>
>>
>> This SaxXPathFragmentFilter is pure experimental. It is spaghetti code.
>> I just sat down with an idea and started to code, and the code is not
>> very pretty. It needs to be rewritten.
>>
>>
>> The xpath support is very limited for now. Here is the xpath you can do
>> today with this filter:
>>       "/a/b" - An absolute path.
>>       "/a/*/c" - An absolute  path but where element no 2 "*" could be
>> any element.
>>       "/a/*/c[@att='value']" - If element c has an attribute with 
>> 'value'.
>>       "/a/*/c[contains='value']" - If element c first child node is a
>> text node that contains 'value'.
>>       "/a/*/c[starts-with='value']" - If element c first child node 
>> is a
>> text node that starts with 'value'.
>>       "/a/*/c[ends-with='value']" - If element c first child node is a
>> text node that ends with 'value'.
>>       "/a/*/c['value']" - If element c first child node is a text node
>> that is 'value'.
>>       "/a/*/c[is='value']" - As above.
>>
>> As you can see the xpath options is very limited. But I think that when
>> I find a way to implement the "//" pattern, the filter will be even 
>> more
>> powerful.
>>
>> I have problems with building a dom tree from the result using xerces
>> and saxon. But with jdom it works great. This needs to be fixed.
>>
>> You can not rely on that the result is allways correct, so don't use
>> this in any application, just use if for expermentation.
>>
>> You can find the code at:
>>
> http://www.npstrandberg.com/projects/saxxpathfragmentfilter/saxxpathfragment
> filter.
>> tar.gz
>>
>> My goal with this filter is to keep it realiable, simple, fast and
>> clean. If you want to contribute to this project, then you will be
>> wellcome. The filter will be realeased under som kind of opensource
>> license (if we get that fare!).
>>
>> Test it an give me some feedback, on what you think.
>>
>>
>> Regards, Niels Peter Strandberg
>>
>>
>> -----------------------------------------------------------------
>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>> initiative of OASIS <http://www.oasis-open.org>
>>
>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>> To subscribe or unsubscribe from this list use the subscription
>> manager: <http://lists.xml.org/ob/adm.pl>
>>
>

Follow-Ups:
- Re: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using aSAX XPath cutter!
  - From: PaulT <pault12@pacbell.net>

References:
- Re: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using aSAX XPath cutter!
  - From: PaulT <pault12@pacbell.net>

Prev by Date: RE: [xml-dev] XML Buzzwords. RFC
Next by Date: Re: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using a SAX XPath cutter!
Previous by thread: Re: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using a SAX XPath cutter!
Next by thread: Re: [xml-dev] SaxXPathFragmentFilter - Reduse large DOM trees using aSAX XPath cutter!
Index(es):
- Date
- Thread