[
Lists Home |
Date Index |
Thread Index
]
- From: steve@rsv.ricoh.com (Stephen R. Savitzky)
- To: xml-dev <xml-dev@ic.ac.uk>
- Date: 07 Dec 1999 17:30:17 -0800
This is basically a traditional top-down, recursive-descent parser.
Unfortunately, it's completely different from the way most XML parsers I've
seen work, although I believe there's a lexical layer underneath expat that
can be made to work this way.
But there's better way of looking at the situation, namely that what you
really want to do is make a top-down traversal of the document's parse tree.
In other words, at any given position in the tree, you want to do pseudocode
like
// process a <foo> element.
Foo::process(const XML::Element &elem) {
// do the setup
for (XML::Node *node = elem.getFirstChild();
node != null;
node = node->getNextSibling())
{
processChild(node); // dispatch on node's type & tag
}
// do the cleanup
}
This works as-is if the result of your parse is a DOM tree or some
equivalent parse-tree representation of the document, but trees take memory.
So the next step is to use a parser that looks like a tree traverser:
// process a <foo> element.
Foo::process(TreeTraverser &it) {
// do the setup, using it.getAttrList(), etc. on the current node
if (it.hasChildren()) {
for (it->toFirstChild(); !it.atEnd(); it.toNextSibling()) {
processChild(it); // dispatch on new current node's type & tag
}
it.toParent(); // go back up the tree
}
// do the cleanup
}
Note that if your parser has this interface, you may never have to actually
build the whole tree. Similarly, you can output to a ``tree constructor''
that merely appends characters to a string.
We've built a document-processing system (currently in Java) using this kind
of interface; you can find it at <http://RiSource.org/PIA/>.
--
Stephen R. Savitzky <steve@rsv.ricoh.com> <http://rsv.ricoh.com/~steve/>
Platform for Information Applications: <http://RiSource.org/PIA/>
Chief Software Scientist, Ricoh Silicon Valley, Inc. Calif. Research Center
voice: 650.496.5710 front desk: 650.496.5700 fax: 650.854.8740
home: <steve@theStarport.org> URL: http://theStarport.org/people/steve/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|