An embedded real-time system, as its name implies, possesses the characteristics of both an embedded system and a real-time system. An embedded system is one that is of special purpose, often a uniprocessor system, and generally is not user-programmable. In addition, a real-time system is one that must perform operations within rigid timing constraints. In many cases, embedded real-time systems must provide long period of uninterrupted service in harsh and dynamic environments. The importance of reliability in embedded real-time systems will increase dramatically as future computers take a more active role in everyday life and industrial sectors. In addition, transient faults in semiconductor devices are becoming more significant because of increased density, low supply voltage, and fast switching signals.
This thesis deals with fault tolerance technique to cope with transient faults for embedded real-time systems. Transient faults are usually overcome using time redundancy, and a typical implementation of time redundancy is checkpointing. Thus, checkpointing problems in embedded real-time systems are explored from a reliability point of view.
A reliability model of a static equidistant checkpointing scheme with non-concurrent fault detection mechanisms is derived. With non-concurrent fault detection mechanisms, faults are detected by some check mechanisms which are performed regularly. Therefore, the latency in detection is inevitable. In deriving the reliability model, the average life time of transient faults is considered, and the derived reliability model is verified by simulations. Based on the reliability model, some sufficient conditions under which a static equidistant checkpointing scheme works positively are discussed. In addition, an optimal strategy which maximizes the system reliability is proposed.
Concurrent fault detection mechanisms can detect faults with significantly less detection latency than non-concurrent mechanisms do. Accordingly, a sy...