Lists Home |
Date Index |
On Fri, Jan 25, 2002 at 09:25:15PM -0800, Sterin, Ilya wrote:
| pull - the program controls what happens to the data the parser returns
| push - the parser controls what happens to the data returns, since it
| maintains the while loop.
Most programs have to maintain state. There are two primary
ways to do this, via the heap or via the stack.
The most convenient way of maintaining state is by using the
program stack. In this case, you have functions which call
other functions; local variables and return results are kept
on the call stack. The compiler (or interpreter) performs
all of the memory management, pushing all of the arguments
and local variables for a function call onto the program
stack before the function is invoked and then popping these
items off the stack when the function terminates. In this
way the programmer doesn't have to worry about managing memory.
The other way of maintaining state is via heap. In this method,
the program allocates memory and gets a pointer back. The program
can then pass around the pointer as needed to access the allocated
memory. When the program is done with the memory, it is the
program's responsibility to free the allocated memory. Many
languages provide for garbage collection so that the programmer
doesn't have to worry about freeing allocated memory. In most
object oriented languages, memory is encapsulated as an object.
The problem here is that if you have two "objects" interacting
with each other, only one of them can use the stack at any
given time. So, if P means Producer, and C means consumer,
you have two different flow control models to choose from....
P P C C
PPP PPP vs CCC CCC
P CCCCC CCCCC P P PPPPP PPPPP P
PPP CCCCCCCCCCCCC PPP PPPPPPPPPPPPPPPPPP
A B C D A B C D
Call stack over time Call stack over time
A. The program starts the parser.
B. The first node arrives.
C. The second node arrives
D. The parser shuts-down.
Time as seen from the pull model:
A. The producer's initialization function (aka Parse)
is pushed onto the stack, it does what it has to do.
It allocates memory on the heap (to save its state)
and then returns this chunk of allocated memory
(usually as a Document object).
B. The consumer proceed along, and eventually is ready
for the first node. So, it asks the producer for
the next node. The producer is pushed onto the
stack and then uses the pointer (to memory on the heap)
to figure out what node to return. Then, control
is once again given back to the consumer who can
dispatch with the node.
C. The same pattern as B, where control moves from
the consumer (who keeps its state on the stack)
to the producer (who reads its state from the heap).
D. The consumer figures out that it doesn't need any
more nodes and shuts down the parser. In this
case, the parser is pushed onto the stack once
more, given it's heap memory. The parser then
cleans up any opened files and frees the memory.
Control returns to the consumer.
What's important is that the consumer can at all
times use the stack to maintain state; where the
producer must use the heap (via an object).
Time as seen from the push model:
A. The consumer loads the parser and provides the
parser with a call-back functions. The consumer
can also provide a data structure; usually memory
allocated from the heap. The producer then
B. When the producer is ready, it sends a node by
pushing the consumer's call-back function onto
the call stack. Then, when the consumer's call-back
is finished, it is popped from the stack. So, if
the consumer wishes to track state between event
notifications, it must use the heap-allocated pointer
C. This is exactly the same as B, only that between
B and C, the producer is in-control of the process.
D. When the producer wants to stop, it simply closes
any resources it may have, and then returns. In
this way, any state maintained by the producer is
automatically recovered as it is popped from the
call stack. Note, that the consumer will most
likely then free the dynamic memory used by the
heap-allocated pointer provided ealier.
Quite clearly, using the stack to maintain state is
much much simpler than using the heap. Thus, the
"pull" model is better for the consumer. I used
"consumer" and "producer" everwhere. This is beacuse
the application is the "consumer" when getting input,
and the producer when making output.
So, one may ask, what is better "push" or "pull".
And the answer begs the question, "which one, the
producer or the consumer, should be easier to write?"
If the answer is the consumer, then you want a
"pull" interface. If the answer is the producer,
then you want the "push" interface. Thus, consider
the information flow below.
FILE -> EVENTS -> EVENTS -> FILE
(PARSER) (APPLICATION) (EMITTER)
In this case, you want to make the application's
life the easiest. Thus, the PARSER should have
a "pull" interface, and the EMITTER should have
a "push" interface. (I use emitter as the opposite
process of parser).
Now... on to your question...
| Is it safe to say then that the underlying DOM parser is
| rather a pull model, since it really maintains the loop.
Yes. Under the sheets, the DOM parser is most likely
using standard input/output library. This library
is "pull" for input (read) and "push" for output (write).
So, yes. The DOM parser is using a "pull" model from
the standard input-output library.
| Or is it not, because it's build based on the pull model,
| the the DOM (processor) is a program that has control of
| the loop and actually retains it in memory?
This sounds paradoxical beacuse you've switched contexts
1/2 way through your question. ;) In DOM's implementation,
it *uses* a pull interface from the standard input. It
also has a pull interface for it's consumers.
How the DOM maintains it's information (either reading it
all into memory up-front, or doing it incrementally) is
an implementation detail and does not invalidate the
push vs pull interface distinction. What is important
to label is the boundary between the producer and the
Clark C. Evans Axista, Inc.
XCOLLA Collaborative Project Management Software