.gEAr - Dataflow oriented testcase generation with Evolutionary Algorithms
In spite of remarkably improved methods in software development, the increasing complexity of modern software systems still represents a main hindrance towards fault-free software development. Size and budget of today’s projects mostly forbid a complete formal verification, which in turn covers only the logical domain of the code but not the real environment, even in cases where verification is feasible. On this account, testing is still considered to be an important means of quality assurance previous to the final release. In order to increase the chances of revealing faults during the testing phase, test cases are selected according to different strategies: for functional testing, inputs are derived from the specified requirements, whilst for structural testing the code has to be executed as exhaustively as possible. In consequence of the complexity of the control structures, the identification of test data allowing to achieve high dataflow coverage is exceedingly time consuming. Checking the correctness of the test results has usually to be carried out by hand and therefore requires a high amount of effort. Therefore, the overall number of test cases is crucial, even if their generation can be performed at low cost, e. g. by means of a random generator.
The aim of the ongoing research project is to automate the generation of adequate sets of dataflow-oriented test cases and to evaluate the adequacy of different techniques with respect to both quality achieved and effort required. In various application areas evolutionary algorithms have revealed as providing suitable search and optimisation strategies. Previous related investigations based on genetic algorithms were restricted to simple control flow criteria (statement and branch coverage), to special purpose programming languages or to language subsets hardly transferable to real projects.
The research project deals with the following objectives:
In order to identify the definition/use-pairs actually covered during the execution of a test case, a tool for monitoring the dynamic execution of a program is developed. It is intended to instrument the test object in such a manner that all relevant dataflow information is logged during each test run.
Based on the dynamic assessment of the coverage achieved by a test set, a procedure is developed aiming at generating an optimal set of test cases using evolutionary algorithms. The fitness function for evaluating each test set depends both on the number of covered dataflow pairs (see dynamic analysis) and on the size of the test set. For this reason the task is considered as a global optimisation problem independent from the control flow structure of the test object. For the actual generation of the test data, different genetic operators are developed to be used in parallel in the context of adaptive evolutionary algorithms. The different options are implemented in a toolset to be tested and evaluated.
The evaluation of the results determined by the evolutionary algorithm requires the knowledge of both the coverage actually achieved (see dynamic analysis) and the coverage attainable. For this purpose a static analyzer has to locate the respective definitions and uses of each variable in the data flow graph. Together with the results achieved by the dynamic analysis, the static analysis is meant to improve the stopping criterion for the global optimisation.
From the localisation of the data flow pairs done so far (see static analysis) additional test paths have to be derived in order to fulfil the testing criterion. The corresponding test cases are to be generated in a local optimisation step using evolutionary algorithms. This requires the definition of a new fitness function based on the graph-theoretic characteristics of the control flow as well as the enhancement of the instrumentation of the test object already implemented (dynamic analysis). Consequently, the automatic generation of test cases is locally directed and enhanced with respect to time effort and data flow coverage achieved.
On the basis of the procedures and tools developed so far, it has to be evaluated how the investigations mentioned above can be extended and enhanced using domain knowledge in order to support the heuristics (e.g. slicing in hybrid evolutionary algorithms) as well as applying distributed respectively self adaptive evolutionary systems.