Detection of Rare Software Failures by Load Testing of Communication Software
LoTCoS - Load Testing of Communication Software
Testing software of communication networks is a very complex task; in particular, it is extremely difficult to identify failures occurring only under very rare circumstances. On the one hand they may occur too seldom in small networks to be detected during testing; on the other hand, large networks in which such failures are likely to occur more often, in general are not available for extensive testing phases due to their cost. In order to support the timely identification of such software bugs new methods are required, aiming at maximizing the reproducibility of real failure scenarios in a cost-effective manner.
Based on this need the primary goal of the research project presented is to check the effectiveness of load testing for communication software. This research project is based on current experiences in industry.
The following goals are part of the project:
At first communication systems are defined which are nodes of a network. Different nets of communication systems are analysed in the project. Fault classes manifesting themselves quite rarely - e.g. concurrency faults occurring only under very special race conditions - are identified and characterized. Load testing strategies are introduced and the goals of the research project are defined.
Definition of the communication systems:Based on the definitions of the network an analytical model is derived.
Using stochastic methods and based under realistic assumptions the probability of the different fault manifestations are analytically determined with respect to different load test strategies.
A simulation model is developed to validate the behaviour of the network of communication systems. This model allows connections through the network to be created and deleted, as well as the execution of predefined processes (like the storage of data) for existing connections. Measurement attributes, e.g. duration and number, are defined.
Faults of specified classes are injected into the system. Their frequency of manifestation is measured with respect to various load testing parameters (e.g. frequency of simulated events).
For each fault class the probability of failure occurrence derived from the analytical model is compared with the frequency of failure occurrence obtained by the simulation model. The comparison yields a range of load parameters, within which the analytical model can be validated by simulation.
Based on the validated results recommendations to optimize load testing in real world applications are derived. This guidance is meant to support the systematic definition of load testing parameters in practice for the purpose of reproducing rare software failures when testing communication systems.