- Our confidence in software systems depends on our confidence in the exhaustiveness of our testing. As software systems get more complex, the task of exhaustive testing becomes more complex and even infeasible in some cases. In order to build less error prone systems, we therefore need to not only focus on quickly and efficiently identifying bugs through testing and verification of software, but also on identifying factors associated with bugs in order to prevent them from occurring in the first place. This thesis shows how mutation testing and fault prediction can be used to improve the quality of software.
In the first part of this thesis, we investigate the notion of testedness and how that is associated with widely-used measures of test suite quality. The first measure is statement coverage, the simplest and best-known code coverage measure. The second measure is mutation score, a supposedly more powerful, though expensive, measure. We evaluate these measures using the actual criteria of interest; if a program element is (by these measures) well tested at a given point in time, it should require fewer future bug-fixes than a "poorly tested" element. If not, then it seems likely that we are not effectively measuring testedness. We show that both statement coverage and mutation score have only a weak negative correlation with bug-fixes, mutation score having slightly stronger correlation between the two.
In the second part, we investigate the applicability of mutation analysis in real world complex software system. Despite four decades of research on mutation analysis technique, its use in large systems is still rare, in part due to computational requirements and high numbers of false positives. We present our experiences using mutation analysis on the Linux kernel's RCU (Read Copy Update) module, where we adapt existing techniques to constrain the complexity and computation requirements. We show that mutation analysis can be a useful tool, uncovering gaps in even well-tested module like RCU. This experiment has not only led to the identification of gaps and bugs in the RCU module but also has led to a discussion on identifying domain specific mutants in order to increase the applicability of mutation analysis in real world applications.
In the third part, we investigate fault prediction models. Although there has been much research on predicting failures, those predictions usually concentrate either on the technical, or the social side of software development. However, software development is not an isolated activity, it requires coordination between individuals and technology. Therefore, to attain the best possible predictive capability, we need to analyze the complex interactions between socio-technical factors. Using one such socio technical factor, merge conflict, we found significant improvement in fault prediction.