xml-dev - XPath filtering SAX was Re: SAX2: marrying SAX and DOM

XPath filtering SAX was Re: SAX2: marrying SAX and DOM

[ Lists Home | Date Index | Thread Index ]

From: "Laurent Bossavit" <laurent@mmania.com>
To: xml-dev@xml.org
Date: Mon, 13 Mar 2000 17:08:59 +0100

Ken MacLeod wrote:

>   * A parser can [should?] maintain a partial DOM tree, at least
>     parents, that would allow other XML functions to be used.  For
>     example, using XPath to perform matching.

I have been doing that, trying to implement 'higher-level' event 
dispatching from a SAX event stream to a listener which defines what 
data it is interested in in the form of XPath expressions.

The API goes roughly as follows (simplified for illustration) - 

public interface XPathListener {
	public abstract void handleData(Node[] nodes);
}

public interface XPathFilter {
	public abstract void addListener(XPathListener l, String match);
	public abstract void process(Parser p,InputSource i);
}

Client code which wants to retrieve some data from an XML stream 
registers a node set expression identifying 'data of interest', and 
only this data will be returned.

Assume the following XML data (partial document)
<stream>
  <data type="int">1</data>
  <data type="str">x</data>
  <data type="str">y</data>
.../...

One would register interest in the value of 'data' elements with 
'str' types using the following code :

   XPathFilter xpf;
   xpf.addListener(this,"data[@type='str']/text()");
   xpf.process(somesaxparser,someinputsource);

the above data would result in two handleData() calls being made, 
once for each text node of a data element with type 'str'. This is 
much cleaner than the alternative - keeping track of state 
information in an object's startElement()/characters()/endElement() 
methods - especially if the element tree is deeper than a couple 
levels.

Naturally, not all XPath features 'work' over SAX - e.g. following-* 
axes or position() calls, depending on how much of the DOM tree you 
are willing to build as you go along. I'm fairly sure though that 
with suitable restrictions this would be a worthwhile addition to the 
XML developer's arsenal, because XPath expressions are a concise way 
of identifying only the parts of an XML data stream that your program 
is interested in - without hand-coding specific automata every time.

If you are parsing whole documents, an XPath matcher on top of the 
DOM will do fine - but this does not work if you need to parse an 
incoming XML-formatted data stream and process data as it becomes 
available, and the class of application I'm working on (real-time 
chat using an XML-formatted protocol) requires that.

I have a quick-and-dirty, proof-of-concept implementation which 
works, kind of - the 'best' way of delivering data-of-interest to 
client code is not obvious (whether to use a DOM-compliant Node class 
or something more lightweight, whether to use arrays or more complex 
collections), and the XPath expression parser is unbelievably crude - 
mostly because in current XPath implementations the parsing code 
cannot be easily separated from code that relies on the DOM.

If anyone is working on something similar, or has suggestions on API 
or implementation, I'm interested in your comments.

========================================
Laurent Bossavit     -     Ingénieur R&D
>>>        laurent@mmania.com        <<<
>>            ICQ#39281367            <<
MultiMania     http://www.multimania.fr/
========================================

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************

Follow-Ups:
- Re: XPath filtering SAX was Re: SAX2: marrying SAX and DOM
  - From: Stefan Haustein <haustein@kimo.cs.uni-dortmund.de>

References:
- SAX2: marrying SAX and DOM
  - From: Ken MacLeod <ken@bitsko.slc.ut.us>

Prev by Date: Re: Gutenberg Project <longish>
Next by Date: ANN: RDFFilter 1.0alpha
Previous by thread: Re: SAX2: marrying SAX and DOM
Next by thread: Re: XPath filtering SAX was Re: SAX2: marrying SAX and DOM
Index(es):
- Date
- Thread