OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] ANN: Amara XML Toolkit 0.9.0

[ Lists Home | Date Index | Thread Index ]

Interesting. We seem to be rediscovering co-routines, plus a lot of other
machinery from Jackson structured programming. It's a powerful solution to
the push-pull dilemma, but it does need support at the programming language
level (because the process has multiple stacks). I tried to do something
similar in a very early version of Saxon, but it relied on Java threads and
became very unwieldy.

Of course if you move to a higher level of programming (say XSLT or XQuery)
then the push-pull decisions, and the mechanisms used to handle push-pull
conflicts, get hidden under the covers and programmers don't need to worry
about them.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Uche Ogbuji [mailto:uche.ogbuji@fourthought.com] 
> Sent: 23 December 2004 08:45
> To: John Cowan
> Cc: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0
> 
> On Thu, 2004-12-23 at 00:53 -0500, John Cowan wrote:
> > Uche Ogbuji scripsit:
> > 
> > > Tenorsax (amara.saxtools.tenorsax) is a framework for 
> "linerarizing"
> > > SAX logic so that it flows more naturally, and needs a 
> lot less state
> > > machine wizardry.
> > 
> > This sounds *very* interesting.  Is there a more detailed 
> writeup somewhere?
> 
> Heh.  I should have known.  My focus in documentation was the Bindery
> (data binding) stuff (which I think is very well documented) because I
> figured the initial audience for Amara would be the typical Python
> programmer who grimaces any time he has to deal with that smells to
> XMLish (SAX and DOM are contemptible Java-isms to many Pythoneers, and
> don't even get them started on that bloated XSLT thingy).
> 
> Anyway, in focusing on documenting the ultra-Python-friendly Bindery I
> did end up neglecting the other parts a bit.  I plan to catch 
> up, and in
> fact, I plan to treat Tenorsax as a main topic in my upcoming O'Reilly
> article [1], which will cover Amara.
> 
> Just to give an idea of the technique, however, I'll post a 
> few methods
> of a sample Tenorsax handler
> 
> First a trivial case, just to set the scene:
> 
>     def handle_meta(self, end_condition):
>         name = self.params.get((None, 'name'))
>         content = self.params.get((None, 'content'))
>         print "Meta name:", name, " content:"
>         print content
>         yield None
>         raise StopIteration
> 
> This method handles XHTML meta tags: worries only about attributes and
> ignores content.
> 
> end_condition is Tenorsax plumbing.  More on it in a bit.  The first 4
> function body lines just grab attribute values and print them to
> console.  self.params within a Tenorsax handler always holds 
> the current
> SAX event.  Of course, the key to Tenorsax linearization is that you
> actually see multiple SAX events within a single method call 
> [2].  Even
> in this simple handler you see 2 events.  The start meta tag 
> comes, and
> then the "yield None" hands control back to Tenorsax, and 
> then upon the
> end meta tag, the code immediately after that line resumes, 
> with all the
> local state intact.  This means that a lot of variables you would have
> usually had to manage across methods in plain old SAX become local
> variables in Tenorsax.  the "raise StopIteration" basically 
> signals back
> to the framework "we're done here".
> 
> On to a more interesting handler:
> 
>     def handle_p(self, end_condition):
>         yield None
>         content = u''
>         while not self.event == end_condition:
>             if self.event[0] == saxtools.CHARACTER_DATA:
>                 content += self.params
>             yield None
>         #Element closed.  Wrap up
>         print "Document content para:", content
>         raise StopIteration
> 
> This time it's a p element, and it has content, so we get to see
> multiple interesting events in one handler.
> 
> The start tag isn't interesting, so we immediately pass 
> control back to
> Tenorsax ("yield None").  Then content is a local variable that will
> aggregate the text content of the p, which could come in multiple text
> events.  end_condition now comes into play: it's tenorsax's way of
> letting each handler method know what event signals the end 
> of its scope
> (e.g. the event for close p tag in this case) [3].  Each child text
> event results in another iteration of the loop, and once the 
> end tag is
> seen, we print the accumulated content.
> 
> Finally, to show more of how handlers are invoked, here's the 
> html:html
> handler:
> 
>     def handle_html(self, end_condition):
>         dispatcher = {
>             (pulldom.START_ELEMENT, XHTML_NS, u'head'):
>             self.handle_head,
>             (pulldom.START_ELEMENT, XHTML_NS, u'body'):
>             self.handle_body,
>             }
>         #Initial call corresponds to the start html element
>         curr_gen = None
>         yield None
>         while not self.event == end_condition:
>             curr_gen = tenorsax.standard_body(dispatcher, curr_gen,
> self.event)
>             yield None
>         #Element closed.  Wrap up
>         raise StopIteration
> 
> dispatcher is a Python dictionary which maps events to handlers.  In
> this case, head start tags get delegated to the 
> self.handle_head method
> and body start tags to the self.handle_body method.  The 
> curr_gen stuff
> is an unfortunate bit of boilerplate I have not yet been able 
> to refine
> away (working on it).  Every now and then I wish Python had macros.
> They would help a lot here.  tenorsax.standard_body 
> automatically checks
> the current event to see if there's a match for delegating to 
> one of the
> methods indicated in dispatcher.
> 
> I'd like to tidy things up a tad bit more, but as it is, I have found
> Tenorsax to be a huge help in writing SAX programs quickly.  The
> Scimitar code that translates Schematron to Python code is implemented
> in only about 400 lines of Python code (excluding comments, spacing,
> etc.), and this includes all the Python skeleton code for emitted
> validator scripts.  I tried implementing it in plain SAX at first.  It
> was running to 2-3 times the code length and my brain was on the verge
> of explosion from the state machine logic.
> 
> Anyway, thanks for asking, and thus helping me seed the documentation.
> More on Tenorsax to come, for sure, because I do think many 
> will find it
> very useful.
> 
> [1] http://www.xml.com/pub/au/84
> 
> [2] For those who care about the nuts and bolts the trick here is
> basically a semi-co-routine arrangement between the Tenorsax framework
> and each handler method in turn.  This is made possible by Python
> generators.  Full co-routines are not really in the cards 
> with Python at
> present, but I'm not convinced they'd make more than a cosmetic
> difference.
> 
> [3] This is a simplified case that doesn't handle nested p tags.
> Supporting nesting is a pretty simple matter.
> 
> -- 
> Uche Ogbuji                                    Fourthought, Inc.
> http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
> Use CSS to display XML - 
> http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
> Full XML Indexes with Gnosis - 
> http://www.xml.com/pub/a/2004/12/08/py-xml.html
> Be humble, not imperial (in design) - 
> http://www.adtmag.com/article.asp?id=10286
> UBL 1.0 - 
> http://www-106.ibm.com/developerworks/xml/library/x-think28.html
> Use Universal Feed Parser to tame RSS - 
> http://www.ibm.com/developerworks/xml/library/x-tipufp.html
> Default and error handling in XSLT lookup tables - 
> http://www.ibm.com/developerworks/xml/library/x-tiplook.html
> A survey of XML standards - 
> http://www-106.ibm.com/developerworks/xml/library/x-stand4/
> The State of Python-XML in 2004 - 
> http://www.xml.com/pub/a/2004/10/13/py-xml.html
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
> 





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS