Lists Home |
Date Index |
I don't know how much value I add to the discussion, but here is nevertheless
the decisions I did concerning run-time validation.
My project is a regression/automation framework, where a central
"Dispatcher" takes files as input, and executes tests out-of-process, with
the files' characteristics as decision points on what tests that are run. The
data that flows(stdin/stdout) between the Dispatcher and the tests are all
XML formats, and so is all meta-data, such as the "Test Descriptors" which
provides information about what types of files a particular test is relevant
to, for example.
All data, outcoming and incoming, is validated. The answer to why that is the
best approach(at least I hope so!), is found in how the framework is used,
and what its goals are.
(since it haven't yet been brought to use, the discussion is from how it is
supposed to be used)
The tests are written by different people, and added on a regular basis.
Hence, there is tests which inevitable are buggy because they are under
The framework has as mission to be user friendly(not require manual
intervenience for example), and to provide stable, exact, and correct
results. Its output cannot be undeterministic.
Since one of the goals is to be robust, and that a large part of the whole(the
tests) are constantly in potentially unstable development states, the only
option is to validate in order to not compromise the robustness. There is at
least a theorethical performance impact, but I rather have that than buggy
software that have a high maintenance burden.
I use libxml2 via the Python bindings; the schemata is compiled once, the data
is serialized anyway, and libxml2 is very fast, so I think teh validation is
close to statistical noise, inbetween the context switches for example.
Since the tests creates the "uncontrolled environment" it makes perhaps the
validation understandable, but why do I validate output data from the
"Dispatcher"? It's afterall under controlled development, with a finite
development period. Again, it's because I rather sacrifice performance, in
front of potential instability.
Since validation adds the possibility for graceful error control, it makes a
system much more robust. I don't see how a system could become stable in
real-world conditions without validation(useful that is), unless it is
absolutely _guaranteed_ that the data is correct.
Regarding validator performance, here is a benchmark:
It is in-house software for KDE, www.kde.org, developed privately by me, but
will be published under GNU GPL in a project-neutral way once it has reached
a state suitable for open source development(post alpha basically).