[ main page ] [ back ]

2010 : Analysis of On-Chip Fault-Tolerant Distributed Algorithms

Author(s)
Matthias Fuegger
Abstract
For Very Large Scale Integrated (VLSI) Circuits intended to be used in highly reliable applications, formal specification and analysis is mandatory. Two trends in VLSI design favour a modeling approach analogous to that used for distributed systems: (i) noticeable communication delays between circuit components and (ii) increasing failure rates caused by wear-out and particle hits in circuits with ever decreasing feature sizes. Despite these striking similarities, specifying and analyzing circuits by means of classic distributed system models is either overly lengthy or not possible: In contrast to classic distributed systems, where computations are performed by processes in discrete steps in time, digital circuits in general adhere to a continuous computation model. An on-chip algorithm thus becomes a set of circuit components that compute on continuous streams of events and communicate by continuous streams of messages. In this thesis a modeling and analysis framework tied to the peculiarities of faulttolerant on-chip algorithms is presented. It is based on a three-fold representation of the behavior of a circuit’s outputs over time. The most fundamental is the signal, a trace of events; next comes the more abstract status, grouping together similar signals, and finally the counting function, grouping together an even larger set of similar signals. The threefold representation allows the specification of diverse circuit components at different levels of abstraction. The capabilities of this framework are then illustrated by applying it to clockless onchip algorithms, that is, circuits that are not driven by a central clock. The framework is extended by a Petri net like specification language, which is used to state pivotal circuit components for building clockless fault-tolerant on-chip algorithms. Among those is the General Join module, a module that allows to merge data provided by different sources in a fault-tolerant manner. In the thesis a complete specification is provided and generic timing properties are derived. Furthermore, an implementation of a General Join module in terms of simpler circuit components is given and proven correct. In contrast to clockless circuits, synchronous circuits are driven by a central clock which inherently constitutes a single-point of failure. A common technique to make synchronous circuits fault-tolerant is by replication of the circuit and its clock source. Thereby, the problem arises to provide fault-tolerant, synchronized clock signals that do not diverge over time to each of the replicas. This problem is termed the “tick generation” problem. It is shown that an alternative to replicated synchronized clock sources is to let a set of General Join modules, forming an on-chip distributed algorithm, generate synchronized clock signals in the course of their interaction. A correctness proof and performance measures of this solution are derived. Finally limitations of a restricted set of on-chip algorithms are established: It is shown that there is no circuit, only comprising combinatorial gates and wires with constant delay that solves the short-pulse-filter problem, a problem of great importance when building state-holding devices like arbiters that are not allowed to glitch at their output. The thesis concludes with a probabilistic by-pass of these limitations.
Bibtex
@phdthesis{ fuegger:2010,
  author =      "Matthias F{\"u}gger",
  title =       "Analysis of On-Chip Fault-Tolerant Distributed Algorithms",
  address =     "Treitlstr. 3/3/182-2, 1040 Vienna, Austria",
  school =      "Technische Universit{\"a}t Wien, Institut f{\"u}r Technische Informatik",
  year =        "2010"
}
Download
Get diss_fuegger.pdf - Adobe PDF-format, (1528.5352 KB; posted at July 09 2013)


[ main page ] [ back ]