xml-dev - RFC: "even simpler" C++ XML parser for object hierarchies

RFC: "even simpler" C++ XML parser for object hierarchies

[ Lists Home | Date Index | Thread Index ]

From: Paul Miller <stele@fxtech.com>
To: xml-dev <xml-dev@ic.ac.uk>
Date: Tue, 07 Dec 1999 19:38:34 -0500

Thanks for all who have given feedback on my desires for a relatively
atypical parsing idiom for XML. Some of my interest is based on a
proprietary parser I wrote a few years ago, that I've used for
everything since. It's tag-based and object-oriented, and each block of
a document can be parsed as a complete unit. When used to parse
object-oriented data, it lets each object easily handle its own parsing.

Now I'd like to apply the same concepts to an XML parser, used primarily
when object-oriented program data is stored as XML syntax.

I believe the best way to describe what I want to do (and why) is to
show a concrete example. Suppose I have a program that generates images
composed of layers with multiple objects in each layer. Each layer has a
size associated with it as well.

The classes I have are:
	Document (contains one or more layers)
	Layer (contains one or more objects and a Size)
	Object (some type of object)
	Size (an object which represents a width and height)
	Point (x,y value)
	Rect (x1,y1 to x2,y2)
	Circle (type/subclass of Object)
	Square (type/subclass of Object)

Ideally, each object would be able to write out its data in XML form,
and parse its own data (along with a list of attributes if it uses
them).

Here is an example xml file:

<Document name="mydocument">
	<Layer name="background">
		<Size>640x480</Size>
		<Object type="circle">
			<Center>320,240</Center>
			<Radius>25.0</Radius>
		</Object>
		<Object type="square">
			<Rect>10,10-40,40</Rect>
		</Object>
	</Layer>
</Document>

If you think about the object hierarchy associated with this document,
you have something like this:

Document
	contains Layer ("background")
		contains Size (640x480)
		contains Circle (Object)
			Contains Point (320,240)
			Contains float (25)
		contains Square (Object)
			Contains Rect (10,10 - 40,40)

I tend to design APIs from the point of view of the programmer. Since as
the number of classes in my application grows, I want to minimize the
amount of extra code I have to write. So I'd like to simplify the
parsing down to the minimum amount of necessary boilerplate code. So
let's assume that each object has its own Parse() method. This method
gets called with an XML::Element object which has the name and
attributes for that object. Parsing of the entire object should be an
atomic operation.

I use static function pointers as callbacks to avoid having to subclass
from any XML-specific classes. User-data is passed along in the parsing
so we can cast it back to the necessary type in one of the element
handlers. The code is presented in C++ but the parsing operations can
easily have a "C" interface. Exceptions are thrown if anything goes
wrong, so there are no error codes.

Here is the code needed to open the XML file and find the top-level XML
element:

Document *App::LoadDocument(const char *path)
{
	// specify a handler to look for "Document" elements
	XML::ElementHandler handlers[] = {
		XML::ElementHandler("Document", sParseDocument)
		XML::ElementHandler::END
	};
	XML::Input file(path);
	file.Parse(handlers, this);
}

>From here on out each object is responsible for parsing itself, based on
an XML::Element object that is passed to it. Please examine the code
closely to see the indended design and flow.

// when a Document element is found, it is passed to the sParseDocument
handler
void App::sParseDocument(const XML::Element &elem, void *userData)
{
	// userData is the App * from the file.Parse() call above
	App *app = (App *)userData;
	// we found a document element, so make one using the attributes
	Document *doc = new Document(elem.GetAttribute("name"));
	// now parse the document
	doc->Parse(elem);
	// if we get here without a thrown exception, the Document parsed
	// okay and we can add it
	app->AddDocument(doc);
}

void Document::Parse(const XML::Element &elem)
{
	// specify handlers to look for "Layer" elements
	XML::ElementHandler handlers[] = {
		XML::ElementHandler("Layer", sParseLayer)
		XML::ElementHandler::END
	};
	elem.Parse(handlers, this);
	// if we needed to do something special, like validating the
	// document, we could do it right here
}

void Document::sParseLayer(const XML::Element &elem, void *userData)
{
	// again, userData is the Document * passed in elem.Parse() above
	Document *doc = (Document *)userData;
	// make a new layer
	Layer *layer = new Layer(elem.GetAttribute("name"));
	// parse the layer
	layer->Parse(elem);
	doc->AddLayer(layer);
}

void Layer::Parse(const XML::Element &elem)
{
	// specify handlers to look for "Size" and "Object" elements
	// note that for the Size element we call the Size object's static
	// parse function directly, and we're passing the address of our
	// contained Size member as its user-data, so we do not need to
	// provide an additional static Size handler to forward to the Size
	// object's member Parse() method
	XML::ElementHandler handlers[] = {
		XML::ElementHandler("Size", Size::sParse, &mSize)
		XML::ElementHandler("Object", sParseObject)
		XML::ElementHandler::END
	};
	elem.Parse(handlers, this);
}

void Size::sParse(const XML::Element &elem, void *userData)
{
	Size *size = (Size *)userData;
	// size has no attributes, just data, so read it directly
	// note that elem.ReadData() reads character data up to the
	// ending element tag and returns the size found
	char tmp[40];
	size_t len = elem.ReadData(tmp, sizeof(tmp));
	tmp[len] = '\0';
	sscanf(tmp, "%dx%d", &size->width, &size->height);
}

void Layer::sParseObject(const XML::Element &elem, void *userData)
{
	// again, userData is the Layer * passed in elem.Parse() above
	Layer *layer = (Layer *)userData;
	// make a new object from the object type
	std::string type = elem.GetAttribute("type");
	// I would normally use a factory here but this illustrates the 
	// point better
	Object *obj = NULL;
	if (type == "circle")
		obj = new Circle();
	else if (type == "square")
		obj = new Square();

	// now let the object (whatever type it is) parse itself
	obj->Parse(elem);
	layer->AddObject(obj);
}

So I hope this gets the idea across. I'd be interested in feedback.

--
Paul Miller - stele@fxtech.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Follow-Ups:
- Re: RFC: "even simpler" C++ XML parser for object hierarchies
  - From: "Joshua E. Smith" <jesmith@kaon.com>
- Re: RFC: "even simpler" C++ XML parser for object hierarchies
  - From: steve@rsv.ricoh.com (Stephen R. Savitzky)

Prev by Date: RE: Appending to an XML document
Next by Date: RE: SGML the next big thing?
Previous by thread: ANNOUNCE: XML Advisory Council
Next by thread: Re: RFC: "even simpler" C++ XML parser for object hierarchies
Index(es):
- Date
- Thread