38/1994 : A measuring method for Fault Injection Experiments in Computing Systems
Dissertationen der Technischen Universitšt Wien (66)
The main goal of this thesis is the development of a measuring methodology for fault injection experiments. Considerations are focused on fault injection for evaluation and comparison of various error detection mechanisms that are implemented in hardware. The study of related research shows that a lot of experimental results have been published on this topic, but a comparison is not possible because different assumptions were made. In order to stimulate a standardization in this area, a suggestion for a common experimental setup is worked out, based on an analysis of the fault injection process. Single transient faults with a duration of 1 to 100 cycles of system clock are found to be most suitable, unless application-specific demands on the fault hypothesis exist. Location and level of injection are varied throughout our experiments.
A survey of commonly used fault injection methods is presented. In particular the attainable granularity of injection time and location are identified as important aspects for a comparison, since well-aimed triggered injection is proposed to increase fault activation. For this reason pin-level injection is preferred, but other methods are used supplementary. Different inter- pretations of common evaluation measures are discussed and new measures - in particular for comparison of mechanisms - are introduced. The resulting demands on system monitoring during experiments are defined in detail.
As a next part a comprehensive measuring model is developed that structures the system in two dimensions: A vertical structure (layer model) describes system function on different levels of abstraction, while a horizontal structure (unit model) divides the system in a number of function units. Both structures are primarily intended to reduce system complexity and to improve the understanding of system function in general and of the error propagation process in particular. In addition, several further applications of the measuring model are worked out and discussed as far as they are directly connected to fault injection experiments.
Finally our methodology is validated by applying it to an actual measuring problem: Experi- mental evaluation of error detection mechanisms of a single-board computer. This single-board computer is equipped with a number of different error detection mechanisms and was developed primarily for the purpose of our experiments. During its design we were careful to support the use of diverse fault injection methods and facilitate comprehensive data collection. A tandem system with two independent processors is located on one board. One processor is used as target for fault injection while the other can be used as a clock-synchronous reference.
As a practical realization of the injection methodology developed, an injection toolset is prepared for our experiments. Basically it consists of a programmable pattern generator for actual fault injection, a state analyzer that is used for triggering the pattern generator, and a timing analyzer for collection of readouts. Fault injection and data collection are controlled by a support software on a PC and are fully automated. An evaluation software calculates the required measures from data recorded during experiments. A combined evaluation of groups of error detection mecha- nisms is directly supported.
During our experiments the target computer alternatively executes one of two simple workloads. The relevant characteristics of both workloads are studied in order to comprehend their influence on results.
More than 30000 fault injections have been performed with the toolset. Excellent reproducibility of experiments can be achieved in spite of some limitations of the equipment currently used. Injections on the clock line and on the reset line demonstrate the usefulness of the evaluation parameters defined. The absence of latent errors indicates the high degree of activation achieved. The trigger capabilities of the system are analyzed and its advantages are demonstrated by an experimental comparison to an untriggered scenario. In both cases the model successfully predicts activation.
author = "Andreas Steininger",
title = "A measuring method for Fault Injection Experiments in Computing Systems",
journal = "Dissertationen der Technischen Universitšt Wien (66)",
year = "1994",
month = "Jan."