[ main page ] [ back ]

2012 : Self-Healing Asynchronous Circuits for High-Reliability Applications

Author(s)
T. Panhofer
Abstract
New generations of integrated programmable logic devices offer more and more resources, which makes them very attractive for implementing even complete systems on chip. Advancing miniaturization, higher integration, continuously decreasing supply voltage and other changing parameters lead to a situation where fault effects that previously were an issue only in extremely harsh environments, e.g. space missions, are now impacting the circuits also in “normal” environments. At the same time the probability for multiple faults occurring during operation is increasing. This diverging evolution – increasing complexity vs. decreasing (system) reliability – is getting a serious problem for high reliability applications. While a lot of methods exist to handle transient faults, there are no consolidated concepts available for permanent faults. Traditional fault tolerance concepts, e.g. TMR, are usually costly in terms of hardware resources, mass and power consumption. Furthermore, for highly complex systems it is difficult to predict the failure modes. In particular for those high-reliability applications, where a repair is very expensive or even impossible, the trend goes towards adaptive systems, that can autonomously cope with failure situations as they arise. In this thesis a self-healing concept for integrated digital logic is presented. The approach is based on asynchronous circuits and uses uses a redundant pipeline as basic circuit structure. Combinational logic is replaced by reconfigurable Self-Healing Cells (SHC). The inherent properties of the asynchronous design style FSL simplifies the design of a fault tolerant system, as it features e.g. fail-stop behavior without additional effort. A watchdog circuit monitors the circuit’s activity and triggers the reconfiguration controller to start the circuit reconfiguration in case of a deadlock. As soon as a valid data and acknowledge path is established, the pipeline autonomously starts working again. In general, this procedure works without loss or corruption of data. However, the pipeline structure and the applied reconfiguration algorithm influence the sensitivity to timing effects and the probability for a successful repair. To verify the function of the concept, a VHDL model of the self-healing pipeline as well as of several different reconfiguration controllers was designed. In addition an abstract Matlab model was established and used for exhaustive fault injection simulations. Finally, the circuits were implemented in a Xilinx Virtex-4 FPGA and hardware fault injection experiments were performed. All models used the same stimulus interface, so that identical situations could be investigated and compared on different abstraction levels. The results justify the suitability of the approach for increasing the fault tolerance of integrated circuits: All single faults, more than 80% of the double faults and nearly 60% of triple faults can be tolerated by the developed concept, while introducing a hardware overhead comparable to a TMR system.
Bibtex
@phdthesis{ :2012,
  author =      "",
  title =       "Self-Healing Asynchronous Circuits for High-Reliability Applications",
  address =     "Treitlstr. 3/3/182-1, 1040 Vienna, Austria",
  school =      "Technische Universit{\"a}t Wien, Institut f{\"u}r Technische Informatik",
  year =        "2012"
}
Download


[ main page ] [ back ]