Re: [xml-dev] How many unit tests should I create for my XMLapplication?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Liam R E Quin <liam@w3.org>
To: "Costello, Roger L." <costello@mitre.org>
Date: Sun, 25 Nov 2012 12:54:58 -0500

On Sat, 2012-11-24 at 18:29 +0000, Costello, Roger L. wrote:
> Hi Folks,
> 
> I am building an XML application. That is, my application consumes XML
> documents and then performs processing. The XML documents conform to
> an XML Schema.

Let's call that an XML-based application to avoid (or create)
confusion....

> How many unit tests should I write for my XML application?

>  Is that a reasonable benchmark -- 6 units tests for each line of XML
> Schema? 

I want to bake a cake. The cake pan weighs 0.5Kg. I weighed my espresso
machine, and the amount of milk I put in my coffee. Is the ratio of
weight of milk to weight of cooking container constant across all foods?

OK, maybe a little ridiculous. But, there's no correlation between
schema size and number of test cases.

Ideally you'd test every possible combination of legal input values -
every possible valid schema instance.

Obviously that's not generally possible.

In programming, the way to make testing feasible is to to take three
approaches -
(1) generate lots of random input and see what your program does;
(2) divide the program into small testable units and test each of those
separately
(3) focus on boundary conditions
(4) ask the users to test for you - this used to be called "beta
testing" and is now called "agile development" :-)

Note: one reason that testing is needed is that programmers can't count 
to high numbers like 3 or 4...

You're really asking about (2) and (3) here.

For example, if you have
  <xs:element name="socksize" type="xs:integer".....
then a real-life value of <socksize>8.5</socksize> should be tested, and
the beta testers will point out an error in the schema.
But you also should test -1, 0, 1, MAXINT - 1, MAXINT, MAXINT + 1,
-(MAXINT), -(MAXINT - 1), -(MAXINT - 2), (-1 - MAXINT), as well as the
values you expect such as 6, 7, 8, 9, 10 11, and values 1 and 2 outside
that range. You should also have some values like "small" or "quite
comfy actually" that users will enter in apparent wilful defiance of
instructions.

Then for each of those values you would need to test the full range of
values for your shoesize element; this may lead, during development, of
the addition of a schematron constraint that shoesize >= socksize. But
you will still need to run the tests during development, in case you
add, in your code,
    socks_per_shoe = shoesize / socksize;
and when socksize is zero your program crashes.

In one schema there might be 100 lines to define those two elements, and
in another schema they might be defined on the same line. It's partly
the amount of documentation, but if we ignore annotations and
comments ,we might get a difference of 20 lines compared to 1 line - one
schema using an abstract type, the other schema using xsd built-in types
as I suggested above. Since you are not primarily testing the schema
processor, the purpose of having some invalid values in the input is to
make sure you've defined the schema appropriately.

Once components of a unit (socksize and shoesize, say) are tested, you
test separate units in combination in the same way.

So, if you agree, this gets to your original question. But the
correlation of element definitions in the schema to units in your code
is unlikely to be 1:1, and the interaction may well not be at the XML
level.

So there isn't a right answer to your original question.

Let's look at it another way: suppose your schema defined 100 elements,
and they are all integers (and there's a single external wrapper).

You decide you can test one of them with a large number of values, to
see if the schema processor is behaving correctly. You assume that if it
rejects "three" as a value for an xs:integer-typed element in one
element it will do so for all element names you use, and leave that sort
of problem to random-input-generation fuzz testing.

Each element will have a minValue and a maxValue, so you will test at
those values, at one less, and one greater, and also at -1, 0 and 1, and
also at the median value, for each integer. That's ten per number.

So if the elements are all independent, 10^100 tests would be reasonable
as a minimum.

But this is not feasible (even if you can do one test per microsecond
you're looking at many, many thousands of years), and more likely the
elements are grouped, like shoe and sock size, and are not all
independent. If there are 10 groups, and you have 100 tests in each
unit, and a few thousand overall tests, you're likely in good shape.
If there are only 2 top-level units, and each has 50 totally independent
elements, you'd need many, many more tests.

The number of tests needed is determined by how the values might
interact in your code.

> Thus, if my XML application consumes XML documents which conform to an
> XML Schema that is 100 lines long, then I should create 600 unit
> tests. Does that sound about right?

No.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
The barefoot typographer - http://www.holoweb.net/~liam/

References:
- How many unit tests should I create for my XML application?
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]