Lists Home |
Date Index |
Elliotte Rusty Harold wrote:
> Still, you could probably rig something up by writing a SAX program to
> generate traces from this test suite using an existing SAX parser. Hmmm,
> in fact you could use several parsers to generate stack traces and see
> what popped out and where different parsers differed from each other.
I guess the first part would be writing a ContentHandler that outputs a
canonicalised dump of what is passed to it (canonicalised meaning hiding
the nondeterminism exhibited by characters() by merging adjacent
invocations into one, and assigning an arbitrary but stable order to
attributes, and any other niggling nondeterminisms)
To make things simple, I'd make it output any characters in a hex
rendition of their UTF-8 codes - because when you're testing conformance
with funny characters, you don't want to inadvertantly output anything
that upsets the diff tool or anything like that.
Maybe it would have to begin with a dump of the status of the features
and options that were passed to the parser, too, to ensure apples are
not compared to oranges?
Its output might be something like:
http://xml.org/sax/features/namespaces = true
http://xml.org/sax/features/namespace-prefixes = true
http://xml.org/sax/features/validation = false
characters(54 65 73 74 20 31)
Once you have this standard handler, it's easy to run the same document
through different parsers to the handler and then run diff on the output.
> I could see a SAX ContentHandler that generated an XML document to
> report what had been reported, and XSLT stylesheets that could compare
> one document to another, similar to how the OASIS XSLT Test suite works.
Be mindful of how strings are represented if the output is XML, since
you may well be wanting to test the behaviour of parsers when presented
with XML containing illegal characters - dumping to hex is quick and
easy but not as clear as full-on arbitrary character escaping by
converting control characters to empty elements like <illegal-char
codepoint="3" /> or something.
One advantage of a strictly defined output format like I gave above is
that you can just run diff on it rather than fiddling with XSLT - but
I'm on Unix so running diff on things is very natural for me; less so
for Windows folks, of course.