Distributed shared memory (DSM) is a viable architecture for scalable, high performance multiprocessor systems. Directory-based cache coherence protocols are widely used for hardware DSM systems, but they suffer from overhead for processing coherence messages. The aim of speculative coherence schemes is to reduce the overhead of performing such coherence actions by speculatively changing the state of cached block in a node. Because dynamic schemes can reflect dynamic behavior of application programs without user intervention, it is a desirable form for a specualtive coherence scheme despite of its hardware overhead. However, existing proposals for dynamic speculative coherence have several disadvantages. Dynamic self-invalidation (DSI) is the first approach and it lacks the capability of predicting timing. Last-touch predictor (LTP) overcomes such handicap, however it needs a special type of integrated processor which includes DSM controller and the predictor. Moreover, it performs poorly for irregular access patterns like competing accesses in synchronization constructs and false sharing. Existing schemes also lack the capability of differentiating write patterns and show low performance for non-migratory write patterns.
In this dissertation, we propose speculative coherence using decoupling synchronization (SCDS), which predicts the timing and types of the coherence action based on synchronization information. SCDS exploits the characteristics that conflicting accesses to a block is usually decoupled by a synchronization. SCDS does not need to watch every memory access in the processor, and it is shown to be less sensitive to competing accesses and false sharing. The simulation under sequential consistency shows higher average performance (7.1% of speedup) than other two schemes (4.2% for LTP, and 2.7% for DSI). SCDS, however, assumes too strict synchronization patterns for some applications like mp3d, thus we suggest SCDS-LS (last synchronization) for suc...