|
Memory consistency
- Coherence protocol is not enough to completely specify the output(s) of a parallel program
- Coherence protocol only provides the foundation to reason about legal outcome of accesses to the same memory location
- Consistency model tells us the possible outcomes arising from legal ordering of accesses to all memory locations
- A shared memory machine advertises the supported consistency model; it is a “contract” with the writers of parallel software and the writers of parallelizing compilers
- Implementing memory consistency model is really a hardware-software tradeoff: a strict sequential model (SC) offers execution that is intuitive, but may suffer in terms of performance; relaxed models (RC) make program reasoning difficult, but may offer better performance
SC
- Recall that an execution is SC if the memory operations form a valid total order i.e. it is an interleaving of the partial program orders
- Sufficient conditions require that a new memory operation cannot issue until the previous one is completed
- This is too restrictive and essentially disallows compiler as well as hardware any re-ordering of instructions
- No microprocessor that supports SC implements sufficient conditions
- Instead, all out-of-order execution is allowed, and a proper recovery mechanism is implemented in case of a memory order violation
- Let’s discuss the MIPS R10000 implementation
SC in MIPS R10000
- Issues instructions out of program order, but commits in order
- The problem is with speculatively executed loads: a load may execute and use a value long before it finally commits
- In the meantime, some other processor may modify that value through a store and the store may commit (i.e. become globally visible) before the load commits: may violate SC (why?)
- How do you detect such a violation?
- How do you recover and guarantee an SC execution?
- Any special consideration for prefetches?
- Binding and non-binding prefetches
- In MIPS R10000 a store remains at the head of the active list until it is completed in cache
- Can we just remove it as soon as it issues and let the other instructions commit (the store can complete from store buffer at a later point)? How far can we go and still guarantee SC?
- The Stanford DASH multiprocessor, on receiving a read reply that is already invalidated, forces the processor to retry that load
- Why can’t it use the value in the cache line and then discard the line?
- Does the cache controller need to take any special action when a line is replaced from the cache?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|