The probability of multiple failures in Software Distributed Shared Memory (SDSM) increases as the system size grows. Recently, the most popular technique of Fault-Tolerant Software Distributed Shared Memory (FT-SDSM) is to store the messages exchanged between communicating nodes in the proper storages with independent checkpointing, called message logging. With its popularity, however, the logging overheads are non-negligible during the failure-free execution, and hence in a few years, an impressive amount of researches have been focused on reducing such a non-negligible overhead.
We have implemented a lightweight logging scheme on home-based SDSM. That is called remote logging where the remote memories are used for the logged data. With its lightweight advantage, however, the logged data in home nodes and back-up nodes are useless during the failure-free execution.
We propose new log usable schemes that enable FT-SDSM with remote logging to be enhanced. All logged data in home nodes and back-up nodes are used to reduce the times stalled for updates of the invalid pages and to minimize the normal execution time. On a page fault, the invalid page can be updated by using the logged data instead of fetching a whole page from its home node.
We have performed the experiments on eight PC clusters. We actually implemented our proposed schemes on FT-SDSM with remote logging. The experimental results show that our proposed log usable schemes outperform our based FT-SDSM which does not make use of any logs in some applications. When we apply our proposed log usable schemes in all, we reduce the number of messages by about 5 - 12 % and minimize the message amounts by about 11 - 78 %, and consequently total execution time becomes less about 13 % than our based model at the best case.