[
Lists Home |
Date Index |
Thread Index
]
> Also, I noticed that there is a patent filed on StAX by the StAX spec lead:
> http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2F
> netahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=20030159112
> .PGNR.&OS=DN/20030159112&RS=DN/20030159112
What is so unique about StAX that it warrants a patent?
Can anyone enlighten me?
I looked at http://www.xml.com/pub/a/2003/09/17/stax.html, an
intro into StAX by Elliot Rusty Harold, and fail to see
any major breakthrough. Pull APIs have been around for a while,
and with the old MSXML 3.0 IMXeaderControl interface pull
parsing - like demonstrated with StAX - is easily possible
on top of SAX. Really, all you need is the capability
to interrupt/suspend a SAX parser, and then resume again.
The only "API" we need to define is a simple ParseEvent interface:
(I am not a Java programmer, so please forgive syntax errors)
public interface ParseEvent {
final int START_ELEMENT = 1;
final int END_ELEMENT = 2;
final int COMMENT = 3;
final int PI = 4;
<and so forth>
int getEventType();
XML Node getEventNode();
}
where XMLNode (or rather a sub-class) exposes properties
appropriate to the event.
A Java adaption of IMXReaderControl could look like this:
public interface XMLReaderControl {
final int READY = 0; // status constant
final int PARSING = 1; // status constant
final int SUSPENDED = 2; // status constant
void suspend(); // legal when PARSING
void abort(); // legal when PARSING, SUSPENDED
void resume(); // legal when SUSPENDED
int getStatus();
}
The main pull parsing loop with a suspendable SAX parser could
then look like this, assuming the content handler implements
the ParseEvent interface described above and calls ReaderControl.suspend()
in the appropriate places:
XMLReader.parse(inputSource);
while (ReaderControl.getStatus() == SUSPENDED) {
switch (ParseEvent.getEventType()) {
case START_ELEMENT:
<cast getEventNode to ElementNode and process it>
break;
case END_ELEMENT:
<cast getEventNode to ElementNode and process it>
break;
case PI:
<cast getEventNode to PINode and process it>
break;
case COMMENT:
<cast getEventNode to CommentNode and process it>
break;
<and so forth>
}
ReaderControl.resume();
}
One can likely implement a re-usable PullHandler class that
does most of the grunt work for that purpose. I would also
think of the ability to plug in an event filter class so
that one can iterate only over the events of interest.
I am not sure which streaming parser implement such suspend functionality,
I only know that Expat already has a patch (not part of the distro yet)
that achieves that. I don't think it should be a major problem to implement.
Karl
|