Monday, August 12, 2013

hate random test failures

Here was the scenario that I was facing, and I hope most of you would be facing this. Your regression build will always fail due some random test failure. When you try to rerun the test locally (or in the build machine itself) it would pass. Every time you would have spent an hour or two in setting up things and rerunning the failed test, finding no real defect.

Over a period of time, a dirty fact that "Failing build is due to random test failure, (and not because of application issue)" will get registered in your mind and in your team's mind.
When a real application defects comes in, the tests would fail, alas we would mistake it for random failure.

I would give you a rough numbers form one of my projects that suffered with this issue. On a high level there were 400 tests. out of which only 150 were passing. On a deeper analysis we found
  • 60 failing tests were tests for out dated application features
  • 84 tests failing due to application defects. (totally 14 defects)
  • Others were real test issue.
The test suite failed to capture the application defect, which was it's only responsibility. It needs hell a lot of maintenance in removing tests for outdated features and find which tests are failing for genuine reason and which are not. Hence that would make sense if I would say, value of failing test suite is zero.Hence effort spent in adding new test on it will as well be waste, with out fixing the current failing tests.

The following would be quick (and some what dirty) fix for the above issue.Create a downstream project, that would pick the failed test run them alone, once the your regression tests run gets completed. Chances are more that a random test will pass in a another run, ( In fact, in one of my project all the random test pass here ). So the downstream project will be your indicator on your test suite sanity.
This gives you some breathing time find the real issue behind the random failing tests and fixing it.

The following might be the reason behind the randomness,
  • The timing issue, like waiting for a page/control to load/disappear. Please don't add blind sleep. Make a feature in your framework to wait for a condition, and mention what exact condition to wait for.
  • Application behaviour. There are time when your application itself behave randomly. Having screen shots for failing tests would help you pin such issues.
  • Random generators used for the test data. There is a practice of creating test data randomly. Avoid this. If you rely on random data to find a defect, chances are that it may not occur at all.

No comments:

Post a Comment