OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] More on taming SAX (was Re: [xml-dev] ANN: Amara XMLToolki

[ Lists Home | Date Index | Thread Index ]

Alan Gutierrez wrote:

>* Jeff Rafter <lists@jeffrafter.com> [2004-12-23 13:43]:
>>>While on the topic of SAX taming features in Amara, there is also
>>>amara.saxtools.xpattern_sax_state_machine, which I didn't even bother
>>>mentioning in the announcement (too much to cram in).
>>Can you expand on your expansion? As I was reading this I was thinking 
>>that in the Java/C# world an interesting approach would be to keep a 
>>pseudo DOM stack for the event hierarchy. Maybe something where you keep 
>>everything at an ancestral level intact while parsing
>>  <bar1>
>>    <baz1/>
>>    <baz2/>
>>  </bar1>
>>  <bar2>
>>    <baz1>
>>      <sub/>
>>    </baz1>
>>    <baz2>text</baz2>
>>  </bar2>
>>So when the event stream reached /foo/bar2/baz2/text() you would have 
>>the following in a DOM like structure:
>>  foo
>>    \
>>     bar1 (... no children)
>>     bar2
>>       \
>>        baz1 (... no children, just the previous sibling and attrs)
>>        baz2 (only the StartTag)
>>I am not sure that the preceding siblings would be very useful and have 
>>more chances for pathological cases but when I construct mini-trees this 
>>is the subset I find handy. It is useful when working with an editor to
>>understand the immediate context. Unfortunately by requiring the 
>>previous siblings you have to maintain quite a bit more... the whole 
>>preceding branch of the tree.
>    I have a SAX library (in Java) that keeps the stack around, but
>    not the preceeding siblings. It is quite useful.
>    It is, actually, very useful to keep a stack around that has a
>    hash table for each level of the stack, it allows for the
>    devleopment of strategies that are themselves stateless.
>    Adding the implied stack goes a long way to make SAX event
>    processing a more practical solution for a lot of problems.

Yes.  This is a useful technique I covered for Python in my article 
"Location, Location, Location 


I think that while useful this technique can still leave a lot of state 
wrangling to the programmer, which is why Amara has several modules that 
go further.

Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Full XML Indexes with Gnosis - http://www.xml.com/pub/a/2004/12/08/py-xml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html
Use Universal Feed Parser to tame RSS - http://www.ibm.com/developerworks/xml/library/x-tipufp.html
Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html
A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/
The State of Python-XML in 2004 - http://www.xml.com/pub/a/2004/10/13/py-xml.html


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS