On Thu, 2004-12-23 at 01:45 -0700, Uche Ogbuji wrote:
> On Thu, 2004-12-23 at 00:53 -0500, John Cowan wrote:
> > Uche Ogbuji scripsit:
> > 
> > > Tenorsax (amara.saxtools.tenorsax) is a framework for "linerarizing"
> > > SAX logic so that it flows more naturally, and needs a lot less state
> > > machine wizardry.
> > 
> > This sounds *very* interesting.  Is there a more detailed writeup somewhere?

While on the topic of SAX taming features in Amara, there is also
amara.saxtools.xpattern_sax_state_machine, which I didn't even bother
mentioning in the announcement (too much to cram in).

This module takes an XPattern (e.g. "/xbel/folder/bookmark") and
generates a state machine which can be plugged into any regular SAX
handler.  In this way, you can automatically look for certain XPatterns
which have interesting bits of code for you to process, and ignore the
rest.  This is sort of the opposite of Tenorsax: embrace the state
machine, but automate it, rather than sweeping it unto a fancy

amara.domtools.pushdom uses this state machine generator to provide a
function where you specify a set of XPatterns, and get back a series of
DOM chunks in series from the SAX parse.  It's like a pulldom, but a
*lot* simpler (and more declarative).  So the following three lines are
*complete* code for printing all links in a, XBEL file:

from amara.domtools import pushdom
for docfrag in pushdom("bookmark", xbel_file):
    print docfrag.firstChild.getAttributeNS(None, 'href')

And what's more, no more than the amount of DOM needed to represent each
bookmark node is in memory at any given time (i.e. similar, friendly
memory usage as SAX).  If you had a terabyte XBEL file, this code would
still only take up a few KB of RAM.

