[
Lists Home |
Date Index |
Thread Index
]
Bill de hÓra wrote:
>>The only real problem with using pull parsers right now is
>>limited availability.
>>
>
>I cite two other problems (maybe just nits) and two other
>processing options.
>
>
>First problem: event based architectures are likely to become an
>basis for building application servers, particularly as we stumble
>into an era of machine to machine XML processing. Apache Axis is a
>babystep in that direction, possibly a leap if and when it moves to
>the non-blocking IO available in the 1.4 JDK. The problem with
>placing pull based /parsing/ on top of event oriented servers is
>that after working so hard to increase server throughput, you then
>re-insert the processing bottleneck by virtue of the parsing being
>the equivalent of blocking requests. The point isn't made against a
>pull oriented /API/ per se, but if the processing must block, let
>it block as late as possible, that is, just below the application.
>
There are a couple of points I'll comment on in this. The first is that
SAX doesn't really function as an event based architecture component
because it's relying on the application to give it control in the first
place - the application thread is what executes all the parsing, as well
as the call-backs to the handler. This is implicit in the SAX
specification since it does not address any synchronization issues that
would needed if different threads could be used.
The second is that the servlet architecture that forms the basis of most
application servers is not really extensible to non-blocking IO. The
servlet model ties up a thread until all processing of a request is
completed, so you may as well have the thread just wait for input if needed.
>Second problem: exposing conditional logic based on switch blocks
>instead of visiting is a lost opportunity. I have this static
>binding and external iteration:
>
> ...
>
>when I could have had a runtime binding based on the types of the
>visitor and visitee and internal iteration (presumably the parser
>is best placed to know the token type) via double dispatch.
>
I think this kind of misses the point of using a pull parser. Here's the
actual main loop I gave in the article for working with a structure
consisting of a couple different tyes of elements in the document, each
with several child elements:
// Main pull parsing loop
byte type;
while ((type = m_parser.next()) != XmlPullParser.END_DOCUMENT) {
// Ignore everything other than a start tag
if (type == XmlPullParser.START_TAG) {
// Process the start tags we're interested in
m_parser.readStartTag(m_startTag);
String lname = m_startTag.getLocalName();
if (lname.equals("stock-trade")) {
parseStockTrade();
} else if (lname.equals("option-trade")) {
parseOptionTrade();
}
}
If I wanted to process other types of document components in this loop I
easily could. In this case my document only used two different types of
child elements of the root, and I wasn't concerned about other types of
components in the document, so I just look specifically for those two
child elements. To process the stock-trade element, which looks like this:
<stock-trade>
<symbol>SUNW</symbol>
<tracking id="7499345">
<time>08:45:19</time>
<seller ident="CCC" type="agent"/>
<buyer ident="ABT" type="agent"/>
<exchange>XA</exchange>
</tracking>
<price>86.24</price>
<quantity>500</quantity>
</stock-trade>
I have the following code:
protected void parseStockTrade()
throws IOException, XmlPullParserException {
String symbol = parseElementContent("symbol");
TrackingData tracking = parseTracking();
double price = Double.parseDouble(parseElementContent("price"));
int shares = Integer.parseInt(parseElementContent("quantity"));
StockTrack.recordTrade(symbol, tracking.m_time, price, shares);
}
protected TrackingData parseTracking()
throws IOException, XmlPullParserException {
// Read id attribute from root element start tag
TrackingData data = new TrackingData();
parseStartTag("tracking");
data.m_id = attributeValue("id");
// Read time as content of its own element
data.m_time = parseElementContent("time");
// Read seller agent information
parseStartTag("seller");
data.m_seller = attributeValue("ident");
data.m_isDirectSeller = "direct".equals(attributeValue("type"));
parseEndTag("seller");
// Read buyer agent information
parseStartTag("buyer");
data.m_buyer = attributeValue("ident");
data.m_isDirectBuyer = "direct".equals(attributeValue("type"));
parseEndTag("buyer");
// Read exchange identifier as content of its own element
data.m_exchange = parseElementContent("exchange");
// Finish with closing tag for root element
parseEndTag("tracking");
return data;
}
Using some simple utility methods I can parse the data content of the
document very easily with direct inline code, rather than having to use
a state machine. I think this is a much more natural style of
programming for most developers - a top-down structure in the code that
reflects the structure of the document.
I could wrap a pull parser in handlers to give the same effect as a SAX
parser interface - in fact, Alek Slominski has actually implemented a
prototype SAX2 push layer on top of a pull parser
(http://www.extreme.indiana.edu/xgws/xsoap/xpp/). Trying to turn a push
interface into a pull interface is much more difficult, basically
requiring a separate thread and associated threading overhead.
If you really want an object-oriented approach to processing XML I think
data binding is the best alternative. SAX is great when you want to use
a visitor-style approach, but IMHO is awkward in many situations because
the data is delivered to the application one piece at a time and needs
to be assembled before use - that's basically the point of the handler
generator programs mentioned in this thread, as well as the handler
examples you provide. Pull parsers let applications handle the assembly
directly, making use of information about the document structure.
- Dennis
|