[
Lists Home |
Date Index |
Thread Index
]
SAX is great for generic XML handling - it's easy to hook up a handler
for building a document representation using DOM or some other model,
for instance. It's very awkward for direct processing by an application,
though, and I think autogenerating state machines just add another layer
of complexity.
Pull parsers seem a better approach for this type of application. Using
a pull parser gets you away from all the problems of event-driven state
machine programming and lets you process the document structure
directly. You can see my JavaWorld comparison at
http://www.javaworld.com/javaworld/jw-03-2002/jw-0329-xmljava2.html for
some discussion and code examples on this topic.
The only real problem with using pull parsers right now is limited
availability. The XMLPull site at http://www.xmlpull.org has details of
the common interface implemented by two pull parsers currently (with
hopefully more to come), so it's a big step in the right direction.
There's also a JSR in progress (JSR-173) to develop a Java standard API
for pull parsers.
- Dennis
Bill de hÓra wrote:
>
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>
>>-----Original Message-----
>>From: Sean McGrath [mailto:sean.mcgrath@propylon.com]
>>
>>There is more to it than a buffer. Parsers can and do emit
>>chunks of content at boundaries that suit themselves. So
>>
>><foo>
>>Hello world
>></foo>
>>
>>is not guaranteed to produce 1 data event that can be slurped
>>into a buffer in one go. More generally, in the presence of
>>mixed content there will definitely be multiple chunks. So
>>you end up with this pattern:
>>
>>start_foo:
>> buffer = ""
>> inFoo = 1
>>
>>end_foo:
>> print buffer
>>
>>characters (chunk):
>> if inFoo:
>> buffer.append (chunk)
>>
>>This rapidly gets out of hand.
>>
>
>Yes it does. However we can start to accept we're hacking a state
>machine and encapsulate the conditional reasoning:
>
>start_foo:
> enterState(start_foo)
>
>end_foo:
> getHandler().execute()
> leaveState(start_foo)
>
>characters (chunk):
> getHandler().accept(chunk)
>
>this can be data driven and very fast; it works much like a simple
>dispatching server or the lookup tables common enough in game
>programming. Granted we've been here before about how developers
>find state machines awkward but it does leave open the possibility
>of being declared and then autogeneratated. Was this approach never
>taken with SGML? There doesn't seem to be a lot work being done in
>the public domain to codegen saxhandlers (maybe I'm looking in the
>wrong places), but I expect it will become common enough. I'm
>pretty sure people are using Maps and the like to key event
>handlers, but I haven't seen it in the wild.
>
>Bill de hÓra
>
>
>-----BEGIN PGP SIGNATURE-----
>Version: PGP 7.0.4
>
>iQA/AwUBPOT1euaWiFwg2CH4EQKSpACfQmqGmuyyAOOY62QwC837Nr6QzYcAniSL
>TmYoU6Bw1SzOptFaH1ebwiiR
>=m9Fb
>-----END PGP SIGNATURE-----
>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>
>
|