xml-dev - Re: [xml-dev] More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolk

Re: [xml-dev] More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolk

[ Lists Home | Date Index | Thread Index ]

To: Uche Ogbuji <uche.ogbuji@fourthought.com>
Subject: Re: [xml-dev] More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0)
From: Alan Gutierrez <alan-xml-dev@engrm.com>
Date: Sun, 26 Dec 2004 15:00:08 -0500
Cc: Jeff Rafter <lists@jeffrafter.com>, xml-dev@lists.xml.org
In-reply-to: <41CEFEED.7000802@fourthought.com>
Mail-followup-to: Uche Ogbuji <uche.ogbuji@fourthought.com>,Jeff Rafter <lists@jeffrafter.com>, xml-dev@lists.xml.org
References: <1103757598.10272.67.camel@borgia> <20041223055359.GK25900@skunk.reutershealth.com> <1103791522.4600.68.camel@borgia> <1103792754.4600.81.camel@borgia> <41CB11D1.50600@jeffrafter.com> <20041225094043.GA14697@maribor.izzy.net> <41CEFEED.7000802@fourthought.com>
User-agent: Mutt/1.4.1i

* Uche Ogbuji <uche.ogbuji@fourthought.com> [2004-12-26 13:12]:
> Alan Gutierrez wrote:
> 
> >* Jeff Rafter <lists@jeffrafter.com> [2004-12-23 13:43]:
> > 
> >
> >>>While on the topic of SAX taming features in Amara, there is also
> >>>amara.saxtools.xpattern_sax_state_machine, which I didn't even bother
> >>>mentioning in the announcement (too much to cram in).
> >>>     
> >>>
> >>Can you expand on your expansion? As I was reading this I was thinking 
> >>that in the Java/C# world an interesting approach would be to keep a 
> >>pseudo DOM stack for the event hierarchy. Maybe something where you keep 
> >>everything at an ancestral level intact while parsing
> >>
> >>
> >><foo>
> >> <bar1>
> >>   <baz1/>
> >>   <baz2/>
> >> </bar1>
> >> <bar2>
> >>   <baz1>
> >>     <sub/>
> >>   </baz1>
> >>   <baz2>text</baz2>
> >> </bar2>
> >></foo>
> >>
> >>So when the event stream reached /foo/bar2/baz2/text() you would have 
> >>the following in a DOM like structure:
> >>
> >> foo
> >>   \
> >>    bar1 (... no children)
> >>    bar2
> >>      \
> >>       baz1 (... no children, just the previous sibling and attrs)
> >>       baz2 (only the StartTag)
> >>
> >>I am not sure that the preceding siblings would be very useful and have 
> >>more chances for pathological cases but when I construct mini-trees this 
> >>is the subset I find handy. It is useful when working with an editor to
> >>understand the immediate context. Unfortunately by requiring the 
> >>previous siblings you have to maintain quite a bit more... the whole 
> >>preceding branch of the tree.
> >>   
> >>
> >
> >   I have a SAX library (in Java) that keeps the stack around, but
> >   not the preceeding siblings. It is quite useful.
> >
> >   It is, actually, very useful to keep a stack around that has a
> >   hash table for each level of the stack, it allows for the
> >   devleopment of strategies that are themselves stateless.
> >
> >   Adding the implied stack goes a long way to make SAX event
> >   processing a more practical solution for a lot of problems.
> > 
> >
> 
> Yes.  This is a useful technique I covered for Python in my article 
> "Location, Location, Location 
> <http://www.xml.com/pub/a/2004/11/24/py-xml.html>":

> http://www.xml.com/pub/a/2004/11/24/py-xml.html

> I think that while useful this technique can still leave a lot of state 
> wrangling to the programmer, which is why Amara has several modules that 
> go further.

    Yes. A lot is still left to the programmer with my tool set, but
    it does pick up a lot common SAX tasks.
    
    I've wondered about what more I could do.

    Hmm.. Read the article. I was talking about how I keep a stack
    of the elements around, and how a silly thing I did turns out to
    be very useful. In the stack of events, for each event, I keep a
    java.util.Map and tuck all sorts of things in there.

    Twice now I've create a little langauge in XML and used SAX to
    parse it. Once I understood what I could and could not do, it
    got pretty easy to express a chore as an XML event stream. It
    was easy to keep track of the chore by tucking state into the
    java.util.Map. Kinda Perlish, but that's me.

    I was wondering if I couldn't specify some of those invidual
    chores within an XML Schema document. When a certian object is
    found in the event stream, acording to XML Schema, Java source
    could be executed, perhaps as a generated class with member
    variables mapped to attributes or the values of childen.

    I've thought about using an XPath tracker in error reporting to
    my library, which would be very simple to add at this point, and
    it's necessary, I think because the document locator loses
    meaning when I chain together a bunch of SAX filters.

    In any case, I'm reading through some of the other articles
    you've been posting. This is a very interesting discussion.

    Cheers.

--
Alan Gutierrez - alan@engrm.com

Follow-Ups:
- Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: Daniela Florescu <dflorescu@mac.com>

References:
- ANN: Amara XML Toolkit 0.9.0
  - From: Uche Ogbuji <uche.ogbuji@fourthought.com>
- Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0
  - From: Uche Ogbuji <uche.ogbuji@fourthought.com>
- More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0)
  - From: Uche Ogbuji <uche.ogbuji@fourthought.com>
- Re: [xml-dev] More on taming SAX (was Re: [xml-dev] ANN: Amara XMLToolkit 0.9.0)
  - From: Jeff Rafter <lists@jeffrafter.com>
- Re: [xml-dev] More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0)
  - From: Alan Gutierrez <alan-xml-dev@engrm.com>
- Re: [xml-dev] More on taming SAX (was Re: [xml-dev] ANN: Amara XMLToolkit 0.9.0)
  - From: Uche Ogbuji <uche.ogbuji@fourthought.com>

Prev by Date: Re: [xml-dev] More on taming SAX (was Re: [xml-dev] ANN: Amara XMLToolkit 0.9.0)
Next by Date: Re: [xml-dev] XPath and XPattern (was Re: [xml-dev] More on taming SAX)
Previous by thread: Re: [xml-dev] More on taming SAX (was Re: [xml-dev] ANN: Amara XMLToolkit 0.9.0)
Next by thread: Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
Index(es):
- Date
- Thread