OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: RFC: "even simpler" C++ XML parser for object hierarchies

[ Lists Home | Date Index | Thread Index ]
  • From: steve@rsv.ricoh.com (Stephen R. Savitzky)
  • To: xml-dev <xml-dev@ic.ac.uk>
  • Date: 07 Dec 1999 17:30:17 -0800

This is basically a traditional top-down, recursive-descent parser.
Unfortunately, it's completely different from the way most XML parsers I've
seen work, although I believe there's a lexical layer underneath expat that
can be made to work this way.

But there's better way of looking at the situation, namely that what you
really want to do is make a top-down traversal of the document's parse tree.
In other words, at any given position in the tree, you want to do pseudocode
like

// process a <foo> element.
Foo::process(const XML::Element &elem) { 
   // do the setup
   for (XML::Node *node = elem.getFirstChild();
	node != null;
	node = node->getNextSibling()) 
   { 
	processChild(node); // dispatch on node's type & tag
   }
   // do the cleanup
}
 
This works as-is if the result of your parse is a DOM tree or some
equivalent parse-tree representation of the document, but trees take memory.
So the next step is to use a parser that looks like a tree traverser:

// process a <foo> element.
Foo::process(TreeTraverser &it) { 
   // do the setup, using it.getAttrList(), etc. on the current node
   if (it.hasChildren()) { 
       for (it->toFirstChild(); !it.atEnd(); it.toNextSibling()) { 
	   processChild(it); // dispatch on new current node's type & tag
       }
       it.toParent();  // go back up the tree
   }
   // do the cleanup
}
 
Note that if your parser has this interface, you may never have to actually
build the whole tree.  Similarly, you can output to a ``tree constructor''
that merely appends characters to a string.

We've built a document-processing system (currently in Java) using this kind
of interface; you can find it at  <http://RiSource.org/PIA/>.

-- 
Stephen R. Savitzky  <steve@rsv.ricoh.com>  <http://rsv.ricoh.com/~steve/>
Platform for Information Applications:      <http://RiSource.org/PIA/>
Chief Software Scientist, Ricoh Silicon Valley, Inc. Calif. Research Center
 voice: 650.496.5710  front desk: 650.496.5700  fax: 650.854.8740 
  home: <steve@theStarport.org> URL: http://theStarport.org/people/steve/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS