xml-dev - Re: [xml-dev] parser models

Re: [xml-dev] parser models

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] parser models
From: Arjun Ray <aray@nyct.net>
Date: Tue, 24 Sep 2002 21:41:25 +0000
In-reply-to: <3D90CBE1.3FE84B2B@cs.indiana.edu>
References: <r01050300-1015-399AD282CE5611D6B76F0003937A08C2@[192.168.124.21]> <200209232145.RAA03126@mail2.reutershealth.com> <skf0pugrt2v186al9126s64sdn2g062l0b@4ax.com> <3D90CBE1.3FE84B2B@cs.indiana.edu>

Aleksander Slominski <aslom@cs.indiana.edu> wrote:
| Arjun Ray wrote:

|> In SAX, if you split up the event handling in the app among various 
|> classes, you have to mess with setHandler() calls explicitly and track 
|> the stack at the same time to get this right.  This is a pain.

| i agree completely and seen it before but i have reached different 
| conclusion: use pull parser API :-)

If the recursive descent pattern and explicit procedural coding appeal to
you, yes, sure.  I don't particularly care for either - especially, the
former in application code - mainly because of the "typed constant" or
similar device intruding into the code (as Bill de Hora explained a while
back): I find the while( event ) { switch( type ) { } } structure, which
inevitably appears, more than a little clumsy.  [Note: I have a lot of
hardcore C programming in my background!]  Either a lot of dissimilar or
unrelated code appears together explicitly in case handling - which makes
handler subclassing on event types difficult to impossible - or the switch
block is a glorified dispatch table, in effect reproducing manually what
polymorphic dispatch could have done for you automatically.
  
But if OO is no concern and like I said, procedural coding floats your
boat, there's no problem.

|>  public class HtmlTable implements Element, Content {
|>      ...
|>      Element newChild( ) {
|>          return (Element) new HtmlTr( ) ; // sexier constructors possible
|>      }
| 
| how do you know that every child of table has to be tr?

This was an example where you could enforce the requirement.  One of the
possibilities hidden in the //-comment is using an inner class for the
real Element interface implementation (like the Adapter pattern), and
defer the child Content object instantiation to the content() call.  You
can take advantage of boilerplate code to enforce other policies: e.g.
treat an unexpected child as opaque and ignore the entire subtree rooted
there, or treat it as transparent and drill down the subtree until you
find a <tr> child, proceeding normally for its duration, then backing out
of the subtree, etc.  (One way to see this is that, in Xpath terms, you
aren't committed to table/tr: you can also handle table//tr when the nodes
in between are "irrelevant".)   

|> A lot of this is boilerplate code that can also be "hoisted".  For deep or
|> deeply recursive structures in the XML, this works very well, I've found.
| 
| i think it can be done much easier with xml pull parser without any
| special support, for example to create HtmlTable from XML input:

| public class HtmlTable {
|   Vector rows = new Vector();
|   public statoc HtmlTable unmarshal(XmlPullParser pp) {
|      if(pp.getEventType() != pp.START_TAG || !"table".equals(pp.getName())
|         thow new ValidationExceptiopn("expected start tag for HTML table"+pp.getpositionDesacripton());
| 
|      HtmlTable table = new HtmlTable();
|      //parse table containig of  <tr>s
|      while(pp.nextTag() == pp.START_TAG) {

So, nextTag ignores everything until the next starttag event?  Shouldn't
you have a switch-block here for the general case (whitespace?  processing
instruction?)

|         if("tr".equals(pp.getName())) {
|             HtmlTr tr = HtmlTr.umarshal(pp);
|             table.rows.addElement(tr);

Here, you're locked into addElement()-ing into Vector rows.  What if you
wanted to do something else?  A variant class would have different fields
and different processing logic right here, but it would have to reproduce
the entirety of the while() block!

I'm not a great fan of copy-paste-edit - it leads to swaths of textually
reproduced application (usually procedural!) code in a bunch of files all
"similar but different".  But it has been the usual if not the inevitable
consequence of setting things up this way.  It works great for one-off
programs, but it doesn't scale to where different apps need to share the
same processing *framework*. 

I'm not saying that a pull API isn't useful.  I just find a push API more
amenable to separation of functions.

Follow-Ups:
- Re: [xml-dev] parser models
  - From: Aleksander Slominski <aslom@cs.indiana.edu>
- Re: [xml-dev] parser models
  - From: John Cowan <jcowan@reutershealth.com>

References:
- parser models
  - From: "Simon St.Laurent" <simonstl@simonstl.com>
- Re: [xml-dev] parser models
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [xml-dev] parser models
  - From: Arjun Ray <aray@nyct.net>
- Re: [xml-dev] parser models
  - From: Aleksander Slominski <aslom@cs.indiana.edu>

Prev by Date: Re: [xml-dev] parser models
Next by Date: Re: [xml-dev] parser models
Previous by thread: Re: [xml-dev] parser models
Next by thread: Re: [xml-dev] parser models
Index(es):
- Date
- Thread