[
Lists Home |
Date Index |
Thread Index
]
> -----Original Message-----
> From: Jeff Lowery [mailto:jlowery@scenicsoft.com]
> Sent: Friday, January 25, 2002 1:32 PM
> To: 'Sterin, Ilya'
> Cc: 'Dare Obasanjo '; 'xml-dev@lists.xml.org '
> Subject: RE: [xml-dev] Push and Pull?
>
>
> > >
> > > pull- kXML: the parser says where it is, the DH tells the
> > parser which
> > > way
> > > to go. In other words, the document handler has the parser
> > > pull only the
> > > data it's interested in; branches get skipped. The
> document does not
> > > have to
> > > be in-memory
> >
> > I see where you are getting with this, but do you mean that
> only the
> > branches that are told to be relavant are kept in memory,
> or neither.
> > Because if nothing is cached, than this goes bac to the pull idea.
>
> Hmmm. The pull-parser sends events, same as the push-parser.
> The data for the current node (name, plus maybe attributes if
> the node is an element) is stored in the event. It may not be
> much data, but it is "in-memory" at that point in time.
I actually ment in memory after the parsing process is ended. The thing
that I guess is confusing, is that
DOM processors are in a way neither push nor pull, but are rather, as
Joe pointed out, a slurp (in his words:-) model. The application
has absolutely no control, other than possibly setting parameters before
parsing, of the parsing process, and presents the structure afterwards.
>
> The document handler may take that event and store it in the
> application's internal cache. Or it may just call the event's
> toString() method that spits the event information out to,
> say, stdout. In that case, the "in-memory" component is very
> transitory.
>
> The main difference between push and pull is that the
> push-parser iterates depth-first through every node in the
> document. The pull parser can be directed to skip branches,
> so you don't get the subevents generated for nodes on those
> branches.
But in practice the pull parser still iterates through each branch in
the parser, but might only return certain parts.
I guess this goes back to Joe's explanation again.
pull - application notifies the parser.
push - parser notifies the application
Ilya
If, say, you skip processing of an element, the
> push parser just zips through the document content looking
> for the skipped element's end tag, and then generates its
> next event from the tag that follows. This can be more
> efficient for grabbing document data that's sparsely
> distributed. Less events generated = less overhead.
>
> So I don't think the two are in anyway equivalent, unless you
> plan on visiting every node in the document anyway, in which
> case they function pretty much the same.
>
>
> >
> > >
> > > batch (or ??)- DOM: A single method loads the document
> > in-memory. The
> > > app
> > > then navigates at its leisure
> >
> > Ok, but I would consider this pull as well. Why do you not
> > think that the
> > pull concept applies here. Instead of telling the parser
> > what to load as in
> > your pull explanation above, you are in a way telling it to
> > load the whole
> > document and make all nodes relavant. Then I think we are
> in the pull
> > concept area again?
>
> Right, but in a push-parser, you can choose what to cache;
> you can grab parts of a document. With a DOM parse method,
> you get the whole document. It's the difference between being
> served a seven-course meal (batch) and choosing from a buffet (pull).
>
> >
> >
> > >
> > > fully directed- (XQuery??)- the document handler builds
> instructions
> > > that the parser uses to navigate and return data from the
> document.
> >
> > That's a new way to look at it for me. Have to give it a
> > thought. I would
> > think XQuery would fit into the DOM/pull (batch) category,
> > since it just
> > uses a different access syntax, but the document is accessed
> > the same way.
>
> Similar to pull, but here you're telling the parser up front
> what it is you're looking for, rather than directing it in
> real time. The problem with pull parsing is that you may skip
> over nodes that you're later interested in if you're not
> careful (sloppy coding); with the fully-directed approach,
> you get everything you intend to get because you specified it
> up front. The directed parser can then order you "queries" so
> that they are optimized for a single pass-through (unless
> there are navigation dependencies on the document content
> which may preclude a single pass-through). This assumes, in
> both cases, you know in advance what you're looking for.
>
> Now, I've used three of these approaches in my own work. I
> haven't used a fully-directed parser (maybe the term
> pre-directed is better), although I envisage it as being
> something similar to SQL queries on a database. That's why I
> mention XQuery. The key difference is that a document in a
> file is sequential access, not random access as in a
> database, so I'm not fully cognizant of what optimizations
> are possible in a sequential access situation.
>
> Take care,
>
> Jeff
>
|