Real-time systems are those which must execute all tasks within their timing constraints. Especially, the increased use of computer systems in many vital aspects of every life, and the growing dependence on these systems, make fault tolerant computing essential in many systems. On the other hand, recent progress on hardware technology for VLSI make tendencies that faults of hardware itself are decreased and instant malfunction of controller by transient faults constitutes a majority of system failure. In addition, recent development of microprocessors and DSPs provided enormous capacity with reasonably low cost. Therefore, to impose fault tolerance to a system, hardware and temporal redundancy are usually used.
In this dissertation, we study fault tolerance methodologies of real-time control systems using modular and temporal redundancy in the presence of transient faults. We propose the DMTR (Dual-Modular Temporal Redundancy) with checkpointing strategy for temporal redundancy and dual-modular redundancy structure for hardware redundancy.
First, we introduce the basic concept of DMTR system and describe the DMTR strategy. Using discrete Markov model, we formulate the STPMs (State Transition Probability Matrices) for given single task and parameters of environments, model and analyze the reliability of DMTR system for concerning write back and/or update-with-communication overhead time in presence of transient faults such as independent and correlated faults. Also, we find optimal number of subslots (concerned to checkpointing interval) for maximum reliability through numerical evaluation of our formulated system model.
Second, we formulate a reliability model of DMTR system with harmonic multiple tasks. For this analysis, we consider the hyper-period in which all multiple tasks are executed an integral number of times, calculate the various STPMs for each task, and formulate the reliability model for the DMTR system based on these STPMs. Since this formulation...