OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Push and Pull?

[ Lists Home | Date Index | Thread Index ]



> -----Original Message-----
> From: Jeff Lowery [mailto:jlowery@scenicsoft.com] 
> Sent: Friday, January 25, 2002 1:32 PM
> To: 'Sterin, Ilya'
> Cc: 'Dare Obasanjo '; 'xml-dev@lists.xml.org '
> Subject: RE: [xml-dev] Push and Pull?
> 
> 
> > > 
> > > pull- kXML: the parser says where it is, the DH tells the
> > parser which
> > > way
> > > to go. In other words, the document handler has the parser
> > > pull only the
> > > data it's interested in; branches get skipped. The 
> document does not
> > > have to
> > > be in-memory
> > 
> > I see where you are getting with this, but do you mean that 
> only the 
> > branches that are told to be relavant are kept in memory, 
> or neither. 
> > Because if nothing is cached, than this goes bac to the pull idea.
> 
> Hmmm. The pull-parser sends events, same as the push-parser. 
> The data for the current node (name, plus maybe attributes if 
> the node is an element) is stored in the event. It may not be 
> much data, but it is "in-memory" at that point in time.

I actually ment in memory after the parsing process is ended.  The thing
that I guess is confusing, is that
DOM processors are in a way neither push nor pull, but are rather, as
Joe pointed out, a slurp (in his words:-) model.  The application
has absolutely no control, other than possibly setting parameters before
parsing, of the parsing process, and presents the structure afterwards.


> 
> The document handler may take that event and store it in the 
> application's internal cache. Or it may just call the event's 
> toString() method that spits the event information out to, 
> say, stdout. In that case, the "in-memory" component is very 
> transitory. 
> 
> The main difference between push and pull is that the 
> push-parser iterates depth-first through every node in the 
> document. The pull parser can be directed to skip branches, 
> so you don't get the subevents generated for nodes on those 
> branches. 

But in practice the pull parser still iterates through each branch in
the parser, but might only return certain parts.
I guess this goes back to Joe's explanation again.

pull - application notifies the parser.
push - parser notifies the application


Ilya

If, say, you skip processing of an element, the 
> push parser just zips through the document content looking 
> for the skipped element's end tag, and then generates its 
> next event from the tag that follows. This can be more 
> efficient for grabbing document data that's sparsely 
> distributed. Less events generated = less overhead.
> 
> So I don't think the two are in anyway equivalent, unless you 
> plan on visiting every node in the document anyway, in which 
> case they function pretty much the same.
> 
> 
> > 
> > > 
> > > batch (or ??)- DOM: A single method loads the document
> > in-memory. The
> > > app
> > > then navigates at its leisure
> > 
> > Ok, but I would consider this pull as well.  Why do you not
> > think that the
> > pull concept applies here.  Instead of telling the parser 
> > what to load as in
> > your pull explanation above, you are in a way telling it to 
> > load the whole
> > document and make all nodes relavant.  Then I think we are 
> in the pull
> > concept area again?
> 
> Right, but in a push-parser, you can choose what to cache; 
> you can grab parts of a document. With a DOM parse method, 
> you get the whole document. It's the difference between being 
> served a seven-course meal (batch) and choosing from a buffet (pull).
> 
> > 
> > 
> > > 
> > > fully directed- (XQuery??)- the document handler builds 
> instructions 
> > > that the parser uses to navigate and return data from the 
> document.
> > 
> > That's a new way to look at it for me.  Have to give it a
> > thought.  I would
> > think XQuery would fit into the DOM/pull (batch) category, 
> > since it just
> > uses a different access syntax, but the document is accessed 
> > the same way.
> 
> Similar to pull, but here you're telling the parser up front 
> what it is you're looking for, rather than directing it in 
> real time. The problem with pull parsing is that you may skip 
> over nodes that you're later interested in if you're not 
> careful (sloppy coding); with the fully-directed approach, 
> you get everything you intend to get because you specified it 
> up front. The directed parser can then order you "queries" so 
> that they are optimized for a single pass-through (unless 
> there are navigation dependencies on the document content 
> which may preclude a single pass-through). This assumes, in 
> both cases, you know in advance what you're looking for. 
> 
> Now, I've used three of these approaches in my own work. I 
> haven't used a fully-directed parser (maybe the term 
> pre-directed is better), although I envisage it as being 
> something similar to SQL queries on a database. That's why I 
> mention XQuery. The key difference is that a document in a 
> file is sequential access, not random access as in a 
> database, so I'm not fully cognizant of what optimizations 
> are possible in a sequential access situation.
> 
> Take care,
> 
> Jeff
> 




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS