Perils of Testing

All-Pairs Testing

At my previous job, I led a test team for a credit card payments application. Our customers were companies who issued payment cards to their employees for work-related expenses. The application was complex in the sense that there were many inputs and configuration settings. We hosted the application, so we knew that each company was configured differently, and that many user accounts were configured differently.

The team team had a copy of the production database (suitably scrubbed to minimize security/privacy issues), and so they had the opportunity to test using "real" data. When I started the job, one of the challenges the test team faced was deciding which configurations to test. There were too many combinations to test exhaustively. The team had tried differently strategies to pick combinations. Sometimes we tested with with companies who had reported a lot of problems. Other times we tested with our most important companies, i.e. those that generated the most revenue for us (we were paid a small percentage of however much money was spent on those company credit cards). Still other times we picked combinations at random.

After some research, I decided to try all-pairs testing. I'll give a brief description of the idea; see this page for some references to more detailed articles, and in particular this article for a good introduction. Think of all the data needed to represent a test case that leads to a bug -- the input values, configuration values, states, and so on -- as a tuple. Most bugs arise from the interaction of a subset of the tuple; in other words, parts of the tuple matter, and parts are don't-care values. In fact, the premise of all-pairs testing is that most bugs arise from either a a specific value of one component of the tuple, or specific values of a pair of components. If your tuples are long, you can test multiple pairs simultaneously with the same tuple. If that premise is true, then, it is worth asking how many tuples you need to test all possible pairs of values. There are more refinements to the idea, e.g. partitioning the tuple components into orthogonal parts, but what I said above is the gist of it.

There are open-source programs available for generating all-pairs tuples from a specification of an program's inputs/configuration settings. I don't remember which one I used, but if you Google around, you can find some.

I thought All-pairs was really useful. I found some bugs and my tests didn't take forever to run. Perhaps the most compelling thing about All-pairs is that it helped me convert an unmanageable problem (testing a huge number of combinations) into a manageable one (testing all pairs of values). All-pairs testing didn't eliminate every bug -- customers still found bugs now and then -- but at least we found a systematic way to attack the combination problem. Sometimes that's good enough.

Application To Events

Lately, I've been thinking about how to apply combinatorial testing to events rather than to inputs. Here's the idea. I want to test a new workflow. The developer designed the workflow with a particular set of event sequences in mind. I want to think about sequences that developer may not have considered. I wonder if there's a simplifying strategy like All-pairs for generating event sequences for testing workflows. Any suggestions?

Randomization has its place in automated testing, but it can also indicate the tester neglected failed to determine the boundary conditions.

Here's an example. Imagine you're testing a function that takes an integer as input. Any input value seems as good as another. Rather than picking a value, you generate one randomly. You run the test and it passes, so you declare the widget works.

Now, why would you use a random number generator rather than just picking a value? If one value is really as good as any other, wouldn't it be simpler to use a constant?

Perhaps the test logs all its inputs, and you want the values to differ so it's easier to distinguish one test from another. Fair enough; use a counter. After all, there is no guarantee that the random number generator will generate unique values. The values will be predicable.

Or perhaps you never got around to figuring out the boundary conditions. You figure using a random number generator increases the odds that you'll find a bug. Well, you might, but now you have an unrepeatable test. One time you run it, it works and you declare the code is ready to ship. The next time you run it, it fails. That's a lousy test. If you haven't figured out the boundary conditions, and you aren't in a position to try every possible value, you are better off looping over multiple values. It doesn't really matter which values you use since it's just a shot in the dark, but I suggest using a repeatable set. If you really want to use a random number generator, you can write a different program that generates a bunch of random values, and then hard-code those values into your test. But that seems like a lot of unnecessary work.

If you don't know the boundary conditions, don't lean on randomization. You are better off having a test that's repeatable. Later on, if someone discovers a bug that the test did not catch, you can augment the test.

Perils of Testing

Thursday, April 1, 2010

event sequencing for software testing

Thursday, February 8, 2007

randomizing

Blog Archive