Lists Home |
Date Index |
On Tue, Dec 18, 2001 at 12:34:10PM +0700, James Clark wrote:
| Perhaps the most fundamental decision in designing a pull API is
| whether the properties for each node are provided
| (a) by methods on some sort of node object returned by the
| scanner/parser/iterator object
| (b) by methods on the scanner/parser object itself; the scanner/parser
| object has methods to move to the next node
If one goes with (a), there seem to be two other choices:
(1) A single hierarchical iterator provided by
the scanner/parser, where BEGIN/END tags
are presented for each node.
(2) A hierarchy of flat iterator, with "nodes" that
have a "children" method that returns the
It seems that both you and John Cowan have chosen (1)
and I was wondering if either of you had tried (2).
I've tried both and find that (2) is much easier
to code with.
The primary argument against (2) is that the children()
method can only be called once if the iterator is over
a sequential access medium.
The primary advantage of (2) is that it uses nodes
instead of tags, and also a nested iterator most often
fits with the processing requirements and helps prevent
bugs since a sub-function cannot accidently "consume"
events which it shouldn't be processing. This is helpful
when composing a processor pipeline from multiple vendors.
I've found it also helps to have an "access" method which
returns "random", "sequential" or "notaccessible", in case
where the node is based on a random access medium, is
sequential access but has not been read, and is sequential
access and has been read. This interface can also be
improved with a "makeRandom" method, which loads the
entire sub-tree in memory for random access.
Thus, (2) ends up being a hybrid DOM where the type of
access given to each node is made explicit.
Clark C. Evans Axista, Inc.
XCOLLA Collaborative Project Management Software