OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] SAX Test suite

[ Lists Home | Date Index | Thread Index ]

Elliotte Rusty Harold wrote:

> Still, you could probably rig something up by writing a SAX program to 
> generate traces from this test suite using an existing SAX parser. Hmmm, 
> in fact you could use several parsers to generate stack traces and see 
> what popped out and where different parsers differed from each other. 

I guess the first part would be writing a ContentHandler that outputs a 
canonicalised dump of what is passed to it (canonicalised meaning hiding 
the nondeterminism exhibited by characters() by merging adjacent 
invocations into one, and assigning an arbitrary but stable order to 
attributes, and any other niggling nondeterminisms)

To make things simple, I'd make it output any characters in a hex 
rendition of their UTF-8 codes - because when you're testing conformance 
with funny characters, you don't want to inadvertantly output anything 
that upsets the diff tool or anything like that.

Maybe it would have to begin with a dump of the status of the features 
and options that were passed to the parser, too, to ensure apples are 
not compared to oranges?

Its output might be something like:

http://xml.org/sax/features/namespaces = true
http://xml.org/sax/features/namespace-prefixes = true
http://xml.org/sax/features/validation = false
characters(54 65 73 74 20 31)

Once you have this standard handler, it's easy to run the same document 
through different parsers to the handler and then run diff on the output.

> I could see a SAX ContentHandler that generated an XML document to 
> report what had been reported, and XSLT stylesheets that could compare 
> one document to another, similar to how the OASIS XSLT Test suite works.

Be mindful of how strings are represented if the output is XML, since 
you may well be wanting to test the behaviour of parsers when presented 
with XML containing illegal characters - dumping to hex is quick and 
easy but not as clear as full-on arbitrary character escaping by 
converting control characters to empty elements like <illegal-char 
codepoint="3" /> or something.

One advantage of a strictly defined output format like I gave above is 
that you can just run diff on it rather than fiddling with XSLT - but 
I'm on Unix so running diff on things is very natural for me; less so 
for Windows folks, of course.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS