OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Penance for misspent attributes

[ Lists Home | Date Index | Thread Index ]

SAX is great for generic XML handling - it's easy to hook up a handler 
for building a document representation using DOM or some other model, 
for instance. It's very awkward for direct processing by an application, 
though, and I think autogenerating state machines just add another layer 
of complexity.

Pull parsers seem a better approach for this type of application. Using 
a pull parser gets you away from all the problems of event-driven state 
machine programming and lets you process the document structure 
directly. You can see my JavaWorld comparison at 
http://www.javaworld.com/javaworld/jw-03-2002/jw-0329-xmljava2.html for 
some discussion and code examples on this topic.

The only real problem with using pull parsers right now is limited 
availability. The XMLPull site at http://www.xmlpull.org has details of 
the common interface implemented by two pull parsers currently (with 
hopefully more to come), so it's a big step in the right direction. 
There's also a JSR in progress (JSR-173) to develop a Java standard API 
for pull parsers.

  - Dennis

Bill de hÓra wrote:

> 
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>
>>-----Original Message-----
>>From: Sean McGrath [mailto:sean.mcgrath@propylon.com] 
>>
>>There is more to it than a buffer. Parsers can and do emit 
>>chunks of content at boundaries that suit themselves. So
>>
>><foo>
>>Hello world
>></foo>
>>
>>is not guaranteed to produce 1 data event that can be slurped 
>>into a buffer in one go. More generally, in the presence of 
>>mixed content there will definitely be multiple chunks. So 
>>you end up with this pattern:
>>
>>start_foo:
>>	buffer = ""
>>	inFoo = 1
>>
>>end_foo:
>>	print buffer
>>
>>characters (chunk):
>>	if inFoo:
>>		buffer.append (chunk)
>>
>>This rapidly gets out of hand.
>>
>
>Yes it does. However we can start to accept we're hacking a state
>machine and encapsulate the conditional reasoning:
>
>start_foo:
>	enterState(start_foo)
>
>end_foo:
>       getHandler().execute()
>       leaveState(start_foo)
>
>characters (chunk):
>	getHandler().accept(chunk)
>
>this can be data driven and very fast; it works much like a simple
>dispatching server or the lookup tables common enough in game
>programming. Granted we've been here before about how developers
>find state machines awkward but it does leave open the possibility
>of being declared and then autogeneratated. Was this approach never
>taken with SGML? There doesn't seem to be a lot work being done in
>the public domain to codegen saxhandlers (maybe I'm looking in the
>wrong places), but I expect it will become common enough. I'm
>pretty sure people are using Maps and the like to key event
>handlers, but I haven't seen it in the wild. 
>
>Bill de hÓra
>
>
>-----BEGIN PGP SIGNATURE-----
>Version: PGP 7.0.4
>
>iQA/AwUBPOT1euaWiFwg2CH4EQKSpACfQmqGmuyyAOOY62QwC837Nr6QzYcAniSL
>TmYoU6Bw1SzOptFaH1ebwiiR
>=m9Fb
>-----END PGP SIGNATURE-----
>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>
>






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS