xml-dev - Re: [xml-dev] SAX and Pull options: was: Penance for misspent attributes

Re: [xml-dev] SAX and Pull options: was: Penance for misspent attributes

[ Lists Home | Date Index | Thread Index ]

To: Bill de hÓra <dehora@eircom.net>
Subject: Re: [xml-dev] SAX and Pull options: was: Penance for misspent attributes
From: Dennis Sosnoski <dms@sosnoski.com>
Date: Mon, 20 May 2002 21:29:33 -0700
Cc: xml-dev@lists.xml.org
References: <000501c20068$2ba4e1e0$887ba8c0@mitchum>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8) Gecko/20020205

Bill de hÓra wrote:

>>The only real problem with using pull parsers right now is
>>limited   availability.
>>
>
>I cite two other problems (maybe just nits) and two other
>processing options. 
>
>
>First problem: event based architectures are likely to become an
>basis for building application servers, particularly as we stumble
>into an era of machine to machine XML processing. Apache Axis is a
>babystep in that direction, possibly a leap if and when it moves to
>the non-blocking IO available in the 1.4 JDK. The problem with
>placing pull based /parsing/ on top of event oriented servers is
>that after working so hard to increase server throughput, you then
>re-insert the processing bottleneck by virtue of the parsing being
>the equivalent of blocking requests. The point isn't made against a
>pull oriented /API/ per se, but if the processing must block, let
>it block as late as possible, that is, just below the application. 
>
There are a couple of points I'll comment on in this. The first is that 
SAX doesn't really function as an event based architecture component 
because it's relying on the application to give it control in the first 
place - the application thread is what executes all the parsing, as well 
as the call-backs to the handler. This is implicit in the SAX 
specification since it does not address any synchronization issues that 
would needed if different threads could be used.

The second is that the servlet architecture that forms the basis of most 
application servers is not really extensible to non-blocking IO. The 
servlet model ties up a thread until all processing of a request is 
completed, so you may as well have the thread just wait for input if needed.

>Second problem: exposing conditional logic based on switch blocks
>instead of visiting is a lost opportunity. I have this static
>binding and external iteration:
>
>    ...
>
>when I could have had a runtime binding based on the types of the
>visitor and visitee and internal iteration (presumably the parser
>is best placed to know the token type) via double dispatch. 
>
I think this kind of misses the point of using a pull parser. Here's the 
actual main loop I gave in the article for working with a structure 
consisting of a couple different tyes of elements in the document, each 
with several child elements:

        // Main pull parsing loop
        byte type;
        while ((type = m_parser.next()) != XmlPullParser.END_DOCUMENT) {

            // Ignore everything other than a start tag
            if (type == XmlPullParser.START_TAG) {

                // Process the start tags we're interested in
                m_parser.readStartTag(m_startTag);
                String lname = m_startTag.getLocalName();
                if (lname.equals("stock-trade")) {
                    parseStockTrade();
                } else if (lname.equals("option-trade")) {
                    parseOptionTrade();
                }
            }

If I wanted to process other types of document components in this loop I 
easily could. In this case my document only used two different types of 
child elements of the root, and I wasn't concerned about other types of 
components in the document, so I just look specifically for those two 
child elements. To process the stock-trade element, which looks like this:

  <stock-trade>
    <symbol>SUNW</symbol>
    <tracking id="7499345">
      <time>08:45:19</time>
      <seller ident="CCC" type="agent"/>
      <buyer ident="ABT" type="agent"/>
      <exchange>XA</exchange>
    </tracking>
    <price>86.24</price>
    <quantity>500</quantity>
  </stock-trade>

I have the following code:

    protected void parseStockTrade()
        throws IOException, XmlPullParserException {
        String symbol = parseElementContent("symbol");
        TrackingData tracking = parseTracking();
        double price = Double.parseDouble(parseElementContent("price"));
        int shares = Integer.parseInt(parseElementContent("quantity"));
        StockTrack.recordTrade(symbol, tracking.m_time, price, shares);
    }

    protected TrackingData parseTracking()
        throws IOException, XmlPullParserException {

        // Read id attribute from root element start tag
        TrackingData data = new TrackingData();
        parseStartTag("tracking");
        data.m_id = attributeValue("id");

        // Read time as content of its own element
        data.m_time = parseElementContent("time");

        // Read seller agent information
        parseStartTag("seller");
        data.m_seller = attributeValue("ident");
        data.m_isDirectSeller = "direct".equals(attributeValue("type"));
        parseEndTag("seller");

        // Read buyer agent information
        parseStartTag("buyer");
        data.m_buyer = attributeValue("ident");
        data.m_isDirectBuyer = "direct".equals(attributeValue("type"));
        parseEndTag("buyer");

        // Read exchange identifier as content of its own element
        data.m_exchange = parseElementContent("exchange");

        // Finish with closing tag for root element
        parseEndTag("tracking");
        return data;
    }

Using some simple utility methods I can parse the data content of the 
document very easily with direct inline code, rather than having to use 
a state machine. I think this is a much more natural style of 
programming for most developers - a top-down structure in the code that 
reflects the structure of the document.

I could wrap a pull parser in handlers to give the same effect as a SAX 
parser interface - in fact, Alek Slominski has actually implemented a 
prototype SAX2 push layer on top of a pull parser 
(http://www.extreme.indiana.edu/xgws/xsoap/xpp/). Trying to turn a push 
interface into a pull interface is much more difficult, basically 
requiring a separate thread and associated threading overhead.

If you really want an object-oriented approach to processing XML I think 
data binding is the best alternative. SAX is great when you want to use 
a visitor-style approach, but IMHO is awkward in many situations because 
the data is delivered to the application one piece at a time and needs 
to be assembled before use - that's basically the point of the handler 
generator programs mentioned in this thread, as well as the handler 
examples you provide. Pull parsers let applications handle the assembly 
directly, making use of information about the document structure.

  - Dennis

Follow-Ups:
- RE: [xml-dev] SAX and Pull options: was: Penance for misspent attributes
  - From: Bill de hÓra <dehora@eircom.net>

References:
- SAX and Pull options: was: Penance for misspent attributes
  - From: Bill de hÓra <dehora@eircom.net>

Prev by Date: SAX and Pull options: was: Penance for misspent attributes
Next by Date: RE: [xml-dev] How to spell "No PSVI" in XSLT 2.0 ?
Previous by thread: SAX and Pull options: was: Penance for misspent attributes
Next by thread: RE: [xml-dev] SAX and Pull options: was: Penance for misspent attributes
Index(es):
- Date
- Thread