Log-based rollback recovery without checkpoints of shared memory in software DSM

Cited 2 time in webofscience Cited 2 time in scopus
  • Hit : 741
  • Download : 9
A common approach to fault-tolerant software DSM is to take checkpoints with message logging. Our remote logging has low overhead because each node saves the coherence-related data into the memory of a remote node through a high-speed system area network. For more lightweight fault-tolerant DSM, in this paper, we mainly focused on eliminating shared memory checkpointing during failure-free execution. Each node independently takes the checkpoints of execution states and non-shared data only. When a node fails, it regenerates its pages from the remote copies in live nodes. In order to efficiently reconstruct pages, we also introduced a XOR-diffing technique. The diff logs, which have been created by XOR operations during failure-free execution, can be applicable to any version of remote copies either backward or forward for recovery. Our scheme reduces the checkpointing overhead and also alleviates the imbalance in execution times among nodes due to independent checkpointing.
Publisher
SPRINGER
Issue Date
2006-02
Language
English
Article Type
Article
Citation

JOURNAL OF SUPERCOMPUTING, v.35, no.2, pp.141 - 154

ISSN
0920-8542
DOI
10.1007/s11227-006-1667-7
URI
http://hdl.handle.net/10203/4684
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 2 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0