xml-dev - Re: RFC: "even simpler" C++ XML parser for object hierarchies

Re: RFC: "even simpler" C++ XML parser for object hierarchies

[ Lists Home | Date Index | Thread Index ]

From: steve@rsv.ricoh.com (Stephen R. Savitzky)
To: xml-dev <xml-dev@ic.ac.uk>
Date: 07 Dec 1999 17:30:17 -0800

This is basically a traditional top-down, recursive-descent parser.
Unfortunately, it's completely different from the way most XML parsers I've
seen work, although I believe there's a lexical layer underneath expat that
can be made to work this way.

But there's better way of looking at the situation, namely that what you
really want to do is make a top-down traversal of the document's parse tree.
In other words, at any given position in the tree, you want to do pseudocode
like

// process a <foo> element.
Foo::process(const XML::Element &elem) { 
   // do the setup
   for (XML::Node *node = elem.getFirstChild();
	node != null;
	node = node->getNextSibling()) 
   { 
	processChild(node); // dispatch on node's type & tag
   }
   // do the cleanup
}
 
This works as-is if the result of your parse is a DOM tree or some
equivalent parse-tree representation of the document, but trees take memory.
So the next step is to use a parser that looks like a tree traverser:

// process a <foo> element.
Foo::process(TreeTraverser &it) { 
   // do the setup, using it.getAttrList(), etc. on the current node
   if (it.hasChildren()) { 
       for (it->toFirstChild(); !it.atEnd(); it.toNextSibling()) { 
	   processChild(it); // dispatch on new current node's type & tag
       }
       it.toParent();  // go back up the tree
   }
   // do the cleanup
}
 
Note that if your parser has this interface, you may never have to actually
build the whole tree.  Similarly, you can output to a ``tree constructor''
that merely appends characters to a string.

We've built a document-processing system (currently in Java) using this kind
of interface; you can find it at  <http://RiSource.org/PIA/>.

-- 
Stephen R. Savitzky  <steve@rsv.ricoh.com>  <http://rsv.ricoh.com/~steve/>
Platform for Information Applications:      <http://RiSource.org/PIA/>
Chief Software Scientist, Ricoh Silicon Valley, Inc. Calif. Research Center
 voice: 650.496.5710  front desk: 650.496.5700  fax: 650.854.8740 
  home: <steve@theStarport.org> URL: http://theStarport.org/people/steve/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

References:
- RFC: "even simpler" C++ XML parser for object hierarchies
  - From: Paul Miller <stele@fxtech.com>

Prev by Date: Re: A question on nomenclature
Next by Date: SML and I18N
Previous by thread: RFC: "even simpler" C++ XML parser for object hierarchies
Next by thread: Re: RFC: "even simpler" C++ XML parser for object hierarchies
Index(es):
- Date
- Thread