xml-dev - Re: [xml-dev] parser models

Re: [xml-dev] parser models

[ Lists Home | Date Index | Thread Index ]

To: Arjun Ray <aray@nyct.net>
Subject: Re: [xml-dev] parser models
From: Aleksander Slominski <aslom@cs.indiana.edu>
Date: Wed, 25 Sep 2002 11:26:34 -0500
Cc: xml-dev@lists.xml.org
References: <r01050300-1015-399AD282CE5611D6B76F0003937A08C2@[192.168.124.21]> <200209232145.RAA03126@mail2.reutershealth.com> <skf0pugrt2v186al9126s64sdn2g062l0b@4ax.com> <3D90CBE1.3FE84B2B@cs.indiana.edu> <o0k1pu4tpacb45jtomkulhvpcbkigjs491@4ax.com> <3D9136AF.8D20A724@cs.indiana.edu> <bei2pu4gdef74abmc944j90tujv0d25a76@4ax.com>

Arjun Ray wrote:

> | can be expressed easier using pull parsing and is both easier to understand
> | and to maintain (debug)
>
> In the "databinding" applications I've seen, the structures rarely go more
> than three significant levels deep, have no recursive structures in the
> data, and tend to consist of sequences in fixed/predictable order.  It's
> all very flat and a pull approach can work well.

hi,

i have done all of this with pull parsing (and SAX as well): transporting
graphs that has recursive structure, handling unordered child elements
and graphs that are very deep...

> | i do not think it is that simple. take for example SAX parsing code
> | generated by JaxMe (see *Handler.jave)
> |
> |   http://www.extreme.indiana.edu/~aslom/xml/databinding/jaxme-xmlpull/jaxme_generated/phone/
>
> You're right, it's very cluttered.  But I wasn't talking about SAX,
> either.
>
> | and compare it with code that is using xml pull parsing
> |
> |   http://www.extreme.indiana.edu/~aslom/xml/databinding/jaxme-xmlpull/phone/
> |
> | they are both equivalent in functionality but which is easier to understand?
>
> Neither one is particularly easy or difficult.  Think "databinding", watch
> for field assignments, and the rest is framework overhead.  If you know
> the framework - always a big if! ;-) - the overhead becomes scrutable as
> you've learned to recognize the boilerplate ("yeah, that's how it's
> supposed to go").

boilerplate code can be generated by JAXB or even better can be constructed
dynamically and provided on the fly using dynamic proxies (if XML schema
can be mapped to existing java interface) ...

> All the magic is supposed to happen in the endChild(child) call.  I did
> say my example was cheesy - the point was to focus on how the endChild()
> method fit into the framework.  Writing an app exactly this way has a
> number of avoidable problems (such as the one I was complaining about!):
> the better approach is to separate the implementations of Element and
> Content interfaces.

i will wait for better example to see ...

> |> I just find a push API more amenable to separation of functions.
> |
> | i am not sure what does it mean? what are the functions you have in
> | mind?
>
> By "functions" I didn't mean function/method calls.  I meant "things to be
> done and variations thereof".  Lots of classes/objects, polymorphism (for
> dispatch - no switch(){} logic!), and subclassing for customization.  It
> isn't everyone's ticket.

i am not sure how many functions are needed when processing XML?
what comes to mind is tokenize XML, produce XML events and
process them doing _something_ ...

> Have you seen Oleg Kiselyov's foldts recursion scheme?
>
>  http://pobox.com/~oleg/ftp/papers/XML-parsing.ps.gz
>
> Passing "seeds" up and down a tree is similar to the patterns I'm trying
> to develop.

i remember this paper. it has a questionable comparison of expat
that uses reading input char-by-char (instead of buffered stream)
that is described as 'well-written' C application ... when compared
to correct buffered version expat is 10x faster than SSAX and not
just 1.4x faster ... (memory utilization was not reported but i am sure
that difference was also significant)

in the paper the author presented only a trivial example of accumulating
character content and that is by far easier to do in pure SAX
(just keep adding characters passed in characters() callback ...)

one thing i did not get: isn't "seed" global variable that is shared by
all handlers in SSAX:make-parser/foldts? also how handling of dispatching
descisions is done, for example if <table> may contain both <th> and <tr>
in any order ...

so i think i will need to wait and see an example where Element/Content
framework works to see its full potential ...

thanks,

alek

Follow-Ups:
- Re: [xml-dev] parser models
  - From: Arjun Ray <aray@nyct.net>
- Re: [xml-dev] parser models
  - From: "Karl Waclawek" <karl@waclawek.net>

References:
- parser models
  - From: "Simon St.Laurent" <simonstl@simonstl.com>
- Re: [xml-dev] parser models
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [xml-dev] parser models
  - From: Arjun Ray <aray@nyct.net>
- Re: [xml-dev] parser models
  - From: Aleksander Slominski <aslom@cs.indiana.edu>
- Re: [xml-dev] parser models
  - From: Arjun Ray <aray@nyct.net>
- Re: [xml-dev] parser models
  - From: Aleksander Slominski <aslom@cs.indiana.edu>
- Re: [xml-dev] parser models
  - From: Arjun Ray <aray@nyct.net>

Prev by Date: RE: [xml-dev] Turn off the Filters (WAS RE: [xml-dev] [Off-Topic] Who/What is f lowserve.com? (A Report to Sender Bogosity))
Next by Date: RE: [xml-dev] InnerXml is like printf (WAS: Underwhelmed)
Previous by thread: Re: [xml-dev] parser models
Next by thread: Re: [xml-dev] parser models
Index(es):
- Date
- Thread