xml-dev - Re: [xml-dev] parser models

Re: [xml-dev] parser models

[ Lists Home | Date Index | Thread Index ]

To: Arjun Ray <aray@nyct.net>
Subject: Re: [xml-dev] parser models
From: Aleksander Slominski <aslom@cs.indiana.edu>
Date: Tue, 24 Sep 2002 23:08:15 -0500
Cc: xml-dev@lists.xml.org
References: <r01050300-1015-399AD282CE5611D6B76F0003937A08C2@[192.168.124.21]> <200209232145.RAA03126@mail2.reutershealth.com> <skf0pugrt2v186al9126s64sdn2g062l0b@4ax.com> <3D90CBE1.3FE84B2B@cs.indiana.edu> <o0k1pu4tpacb45jtomkulhvpcbkigjs491@4ax.com>

Arjun Ray wrote:

> If the recursive descent pattern and explicit procedural coding appeal to
> you, yes, sure.  I don't particularly care for either - especially, the
> former in application code - mainly because of the "typed constant" or
> similar device intruding into the code (as Bill de Hora explained a while
> back): I find the while( event ) { switch( type ) { } } structure, which
> inevitably appears, more than a little clumsy.  [Note: I have a lot of
> hardcore C programming in my background!]  Either a lot of dissimilar or
> unrelated code appears together explicitly in case handling - which makes
> handler subclassing on event types difficult to impossible - or the switch
> block is a glorified dispatch table, in effect reproducing manually what
> polymorphic dispatch could have done for you automatically.

hi,

using carefully good practices and patterns switch or even use if statements
can be minimized (if not eliminated totally).

> But if OO is no concern and like I said, procedural coding floats your
> boat, there's no problem.

i think the question is: what is the best approach for the job? i believe that
XML databinding (such as in your Element/Content framework)
is can be expressed easier using pull parsing and is both easier to understand
and to maintain (debug) - why should i a priori limit my options only to OO
(BTW i do not think that callbacks are that much OO anyway ...)

> |>  public class HtmlTable implements Element, Content {
> |>      ...
> |>      Element newChild( ) {
> |>          return (Element) new HtmlTr( ) ; // sexier constructors possible
> |>      }
> |
> | how do you know that every child of table has to be tr?
>
> This was an example where you could enforce the requirement.  One of the
> possibilities hidden in the //-comment is using an inner class for the
> real Element interface implementation (like the Adapter pattern), and
> defer the child Content object instantiation to the content() call.  You
> can take advantage of boilerplate code to enforce other policies: e.g.
> treat an unexpected child as opaque and ignore the entire subtree rooted
> there, or treat it as transparent and drill down the subtree until you
> find a <tr> child, proceeding normally for its duration, then backing out
> of the subtree, etc.  (One way to see this is that, in Xpath terms, you
> aren't committed to table/tr: you can also handle table//tr when the nodes
> in between are "irrelevant".)

i do not think it is that simple. take for example SAX parsing code
generated by JaxMe (see *Handler.jave)

  http://www.extreme.indiana.edu/~aslom/xml/databinding/jaxme-xmlpull/jaxme_generated/phone/

and compare it with code that is using xml pull parsing

  http://www.extreme.indiana.edu/~aslom/xml/databinding/jaxme-xmlpull/phone/

they are both equivalent in functionality but which is easier to understand?

> |> A lot of this is boilerplate code that can also be "hoisted".  For deep or
> |> deeply recursive structures in the XML, this works very well, I've found.
> |
> | i think it can be done much easier with xml pull parser without any
> | special support, for example to create HtmlTable from XML input:
>
> | public class HtmlTable {
> |   Vector rows = new Vector();
> |   public statoc HtmlTable unmarshal(XmlPullParser pp) {
> |      if(pp.getEventType() != pp.START_TAG || !"table".equals(pp.getName())
> |         thow new ValidationExceptiopn("expected start tag for HTML table"+pp.getpositionDesacripton());
> |
> |      HtmlTable table = new HtmlTable();
> |      //parse table containig of  <tr>s
> |      while(pp.nextTag() == pp.START_TAG) {
>
> So, nextTag ignores everything until the next starttag event?  Shouldn't
> you have a switch-block here for the general case (whitespace?  processing
> instruction?)

i think that by using default SAX2 content handler you also ignore it ...
(and it is also missing form Element interface)

one of beauty of xml pull APIs that you can easily compose multiple
lower level functions into higher level function.

> |         if("tr".equals(pp.getName())) {
> |             HtmlTr tr = HtmlTr.umarshal(pp);
> |             table.rows.addElement(tr);
>
> Here, you're locked into addElement()-ing into Vector rows.

i have only added Vector as an example of doing something useful
with "tr" elements (there was nothing in your example - it seems that
HtmlTr object instances were constructed and then discarded ...)

> What if you
> wanted to do something else?  A variant class would have different fields
> and different processing logic right here, but it would have to reproduce
> the entirety of the while() block!

this is just 3 lines (and it is much shorter that original Element/Content
with multitude of methods to implement ...)

anyway that could be easily parametrized by adding new abstract method
or delegating finding of handler for "tr" start tag to different class
(like SOAP processor may find deserializer for element namespace uri
and local name possibly overridden by xsi:type by looking for it in registry etc. ...)

> I'm not a great fan of copy-paste-edit - it leads to swaths of textually
> reproduced application (usually procedural!) code in a bunch of files all
> "similar but different".  But it has been the usual if not the inevitable
> consequence of setting things up this way.  It works great for one-off
> programs, but it doesn't scale to where different apps need to share the
> same processing *framework*.

based on my experience with SOAP 1.1 processing it works greatly
in frameworks and you can easily write general deserializers such
as one to handle all Java Beans ....

> I just find a push API more
> amenable to separation of functions.

i am not sure what does it mean? what are the functions you have in mind?

thanks,

alek

Follow-Ups:
- Re: [xml-dev] parser models
  - From: Arjun Ray <aray@nyct.net>

References:
- parser models
  - From: "Simon St.Laurent" <simonstl@simonstl.com>
- Re: [xml-dev] parser models
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [xml-dev] parser models
  - From: Arjun Ray <aray@nyct.net>
- Re: [xml-dev] parser models
  - From: Aleksander Slominski <aslom@cs.indiana.edu>
- Re: [xml-dev] parser models
  - From: Arjun Ray <aray@nyct.net>

Prev by Date: Re: [xml-dev] parser models
Next by Date: Re: [xml-dev] parser models
Previous by thread: Re: [xml-dev] parser models
Next by thread: Re: [xml-dev] parser models
Index(es):
- Date
- Thread