xml-dev - Re: [xml-dev] Push and Pull?

Re: [xml-dev] Push and Pull?

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Push and Pull?
From: "Clark C . Evans" <cce@clarkevans.com>
Date: Fri, 25 Jan 2002 23:09:11 -0500
In-reply-to: <000001c1a629$d7675d30$0cebd018@ilyatbjnaicm5x>; from Isterin@ciber.com on Fri, Jan 25, 2002 at 09:25:15PM -0800
References: <20020125211323.A22477@doublegemini.com> <000001c1a629$d7675d30$0cebd018@ilyatbjnaicm5x>
User-agent: Mutt/1.2.5i

On Fri, Jan 25, 2002 at 09:25:15PM -0800, Sterin, Ilya wrote:
| pull - the program controls what happens to the data the parser returns
| push - the parser controls what happens to the data returns, since it
| maintains the while loop.

Most programs have to maintain state.  There are two primary
ways to do this, via the heap or via the stack.  

The most convenient way of maintaining state is by using the
program stack.  In this case, you have functions which call 
other functions; local variables and return results are kept 
on the call stack.  The compiler (or interpreter) performs
all of the memory management, pushing all of the arguments
and local variables for a function call onto the program
stack before the function is invoked and then popping these
items off the stack when the function terminates.   In this
way the programmer doesn't have to worry about managing memory.

The other way of maintaining state is via heap.  In this method,
the program allocates memory and gets a pointer back.  The program
can then pass around the pointer as needed to access the allocated
memory.  When the program is done with the memory, it is the 
program's responsibility to free the allocated memory.  Many 
languages provide for garbage collection so that the programmer
doesn't have to worry about freeing allocated memory.   In most
object oriented languages, memory is encapsulated as an object.

The problem here is that if you have two "objects" interacting
with each other, only one of them can use the stack at any 
given time.   So, if P means Producer, and C means consumer,
you have two different flow control models to choose from....
   
          "pull"                       "push"

         P     P                     C     C
        PPP   PPP         vs        CCC   CCC
   P   CCCCC CCCCC   P          P  PPPPP PPPPP P 
  PPP CCCCCCCCCCCCC PPP        PPPPPPPPPPPPPPPPPP
 CCCCCCCCCCCCCCCCCCCCCCC      CCCCCCCCCCCCCCCCCCCC
 
  A     B     C     D          A    B     C    D

   Call stack over time           Call stack over time


A.  The program starts the parser.
B.  The first node arrives.
C.  The second node arrives
D.  The parser shuts-down.

Time as seen from the pull model:

A. The producer's initialization function (aka Parse)
   is pushed onto the stack, it does what it has to do.
   It allocates memory on the heap (to save its state)
   and then returns this chunk of allocated memory
   (usually as a Document object).

B. The consumer proceed along, and eventually is ready
   for the first node.  So, it asks the producer for
   the next node.  The producer is pushed onto the 
   stack and then uses the pointer (to memory on the heap)
   to figure out what node to return.  Then, control
   is once again given back to the consumer who can
   dispatch with the node.

C. The same pattern as B, where control moves from
   the consumer (who keeps its state on the stack)
   to the producer (who reads its state from the heap).
   

D. The consumer figures out that it doesn't need any
   more nodes and shuts down the parser.  In this
   case, the parser is pushed onto the stack once
   more, given it's heap memory.  The parser then
   cleans up any opened files and frees the memory.
   Control returns to the consumer.

   What's important is that the consumer can at all
   times use the stack to maintain state; where the
   producer must use the heap (via an object).

Time as seen from the push model:

A. The consumer loads the parser and provides the
   parser with a call-back functions.  The consumer
   can also provide a data structure; usually memory
   allocated from the heap.  The producer then 
   initializes.

B. When the producer is ready, it sends a node by 
   pushing the consumer's call-back function onto
   the call stack.  Then, when the consumer's call-back 
   is finished, it is popped from the stack. So, if
   the consumer wishes to track state between event 
   notifications, it must use the heap-allocated pointer 
   provided earlier.

C. This is exactly the same as B, only that between
   B and C, the producer is in-control of the process.

D. When the producer wants to stop, it simply closes
   any resources it may have, and then returns.  In
   this way, any state maintained by the producer is
   automatically recovered as it is popped from the
   call stack.   Note, that the consumer will most
   likely then free the dynamic memory used by the
   heap-allocated pointer provided ealier.

Quite clearly, using the stack to maintain state is
much much simpler than using the heap.  Thus, the 
"pull" model is better for the consumer.  I used 
"consumer" and "producer" everwhere.  This is beacuse 
the application is the "consumer" when getting input, 
and the producer when making output.

So, one may ask, what is better "push" or "pull".
And the answer begs the question, "which one, the
producer or the consumer, should be easier to write?"
If the answer is the consumer, then you want a 
"pull" interface.  If the answer is the producer, 
then you want the "push" interface.  Thus, consider
the information flow below.


FILE   ->   EVENTS     ->       EVENTS ->  FILE
    (PARSER)      (APPLICATION)     (EMITTER)

In this case, you want to make the application's
life the easiest.  Thus, the PARSER should have
a "pull" interface, and the EMITTER should have
a "push" interface.  (I use emitter as the opposite
process of parser).

Now... on to your question...

| Is it safe to say then that the underlying DOM parser is 
| rather a pull model, since it really maintains the loop.

Yes.  Under the sheets, the DOM parser is most likely
using standard input/output library.  This library
is "pull" for input (read) and "push" for output (write).
So, yes.  The DOM parser is using a "pull" model from
the standard input-output library.

| Or is it not, because it's build based on the pull model, 
| the the DOM (processor) is a program that has control of
| the loop and actually retains it in memory?

This sounds paradoxical beacuse you've switched contexts
1/2 way through your question.  ;)  In DOM's implementation, 
it *uses* a pull interface from the standard input.  It
also has a pull interface for it's consumers.

How the DOM maintains it's information (either reading it
all into memory up-front, or doing it incrementally) is
an implementation detail and does not invalidate the
push vs pull interface distinction.  What is important
to label is the boundary between the producer and the
consumer.

Kind Regards,

Clark

-- 
Clark C. Evans                   Axista, Inc.
http://www.axista.com            800.926.5525
XCOLLA Collaborative Project Management Software

References:
- Re: [xml-dev] Push and Pull?
  - From: "Clark C . Evans" <cce@clarkevans.com>
- RE: [xml-dev] Push and Pull?
  - From: "Sterin, Ilya" <Isterin@ciber.com>

Prev by Date: Re: [xml-dev] Push and Pull?
Next by Date: Re: [xml-dev] What is the name of a document's "type"?
Previous by thread: Re: [xml-dev] Push and Pull?
Next by thread: RE: [xml-dev] Push and Pull?
Index(es):
- Date
- Thread