OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0)

[ Lists Home | Date Index | Thread Index ]

On Thu, 2004-12-23 at 01:45 -0700, Uche Ogbuji wrote:
> On Thu, 2004-12-23 at 00:53 -0500, John Cowan wrote:
> > Uche Ogbuji scripsit:
> > 
> > > Tenorsax (amara.saxtools.tenorsax) is a framework for "linerarizing"
> > > SAX logic so that it flows more naturally, and needs a lot less state
> > > machine wizardry.
> > 
> > This sounds *very* interesting.  Is there a more detailed writeup somewhere?

While on the topic of SAX taming features in Amara, there is also
amara.saxtools.xpattern_sax_state_machine, which I didn't even bother
mentioning in the announcement (too much to cram in).

This module takes an XPattern (e.g. "/xbel/folder/bookmark") and
generates a state machine which can be plugged into any regular SAX
handler.  In this way, you can automatically look for certain XPatterns
which have interesting bits of code for you to process, and ignore the
rest.  This is sort of the opposite of Tenorsax: embrace the state
machine, but automate it, rather than sweeping it unto a fancy

amara.domtools.pushdom uses this state machine generator to provide a
function where you specify a set of XPatterns, and get back a series of
DOM chunks in series from the SAX parse.  It's like a pulldom, but a
*lot* simpler (and more declarative).  So the following three lines are
*complete* code for printing all links in a, XBEL file:

from amara.domtools import pushdom
for docfrag in pushdom("bookmark", xbel_file):
    print docfrag.firstChild.getAttributeNS(None, 'href')

And what's more, no more than the amount of DOM needed to represent each
bookmark node is in memory at any given time (i.e. similar, friendly
memory usage as SAX).  If you had a terabyte XBEL file, this code would
still only take up a few KB of RAM.

Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Full XML Indexes with Gnosis - http://www.xml.com/pub/a/2004/12/08/py-xml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html
Use Universal Feed Parser to tame RSS - http://www.ibm.com/developerworks/xml/library/x-tipufp.html
Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html
A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/
The State of Python-XML in 2004 - http://www.xml.com/pub/a/2004/10/13/py-xml.html


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS