OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] How much run-time validation do you do?

[ Lists Home | Date Index | Thread Index ]

I don't know how much value I add to the discussion, but here is nevertheless 
the decisions I did concerning run-time validation. 

My project is a regression/automation framework[1], where a central 
"Dispatcher" takes files as input, and executes tests out-of-process, with 
the files' characteristics as decision points on what tests that are run. The 
data that flows(stdin/stdout) between the Dispatcher and the tests are all 
XML formats, and so is all meta-data, such as the "Test Descriptors" which 
provides information about what types of files a particular test is relevant 
to, for example.

All data, outcoming and incoming, is validated. The answer to why that is the 
best approach(at least I hope so!), is found in how the framework is used, 
and what its goals are.

(since it haven't yet been brought to use, the discussion is from how it is 
supposed to be used)

The tests are written by different people, and added on a regular basis. 
Hence, there is tests which inevitable are buggy because they are under 

The framework has as mission to be user friendly(not require manual 
intervenience for example), and to provide stable, exact, and correct 
results. Its output cannot be undeterministic.

Since one of the goals is to be robust, and that a large part of the whole(the 
tests) are constantly in potentially unstable development states, the only 
option is to validate in order to not compromise the robustness. There is at 
least a theorethical performance impact, but I rather have that than buggy 
software that have a high maintenance burden.

I use libxml2 via the Python bindings; the schemata is compiled once, the data 
is serialized anyway, and libxml2 is very fast, so I think teh validation is 
close to statistical noise, inbetween the context switches for example.

Since the tests creates the "uncontrolled environment" it makes perhaps the 
validation understandable, but why do I validate output data from the 
"Dispatcher"? It's afterall under controlled development, with a finite 
development period. Again, it's because I rather sacrifice performance, in 
front of potential instability.

Since validation adds the possibility 	for graceful error control, it makes a 
system much more robust. I don't see how a system could become stable in 
real-world conditions without validation(useful that is), unless it is 
absolutely _guaranteed_ that the data is correct.

Regarding validator performance, here is a benchmark:



It is in-house software for KDE, www.kde.org, developed privately by me, but 
will be published under GNU GPL in a project-neutral way once it has reached 
a state suitable for open source development(post alpha basically).


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS