Module 15: "Memory Consistency Models"
  Lecture 34: "Sequential Consistency and Relaxed Models"
 

Memory consistency

  • Coherence protocol is not enough to completely specify the output(s) of a parallel program
    • Coherence protocol only provides the foundation to reason about legal outcome of accesses to the same memory location
    • Consistency model tells us the possible outcomes arising from legal ordering of accesses to all memory locations
    • A shared memory machine advertises the supported consistency model; it is a “contract” with the writers of parallel software and the writers of parallelizing compilers
    • Implementing memory consistency model is really a hardware-software tradeoff: a strict sequential model (SC) offers execution that is intuitive, but may suffer in terms of performance; relaxed models (RC) make program reasoning difficult, but may offer better performance

SC

  • Recall that an execution is SC if the memory operations form a valid total order i.e. it is an interleaving of the partial program orders
    • Sufficient conditions require that a new memory operation cannot issue until the previous one is completed
    • This is too restrictive and essentially disallows compiler as well as hardware any re-ordering of instructions
    • No microprocessor that supports SC implements sufficient conditions
    • Instead, all out-of-order execution is allowed, and a proper recovery mechanism is implemented in case of a memory order violation
    • Let’s discuss the MIPS R10000 implementation

SC in MIPS R10000

  • Issues instructions out of program order, but commits in order
    • The problem is with speculatively executed loads: a load may execute and use a value long before it finally commits
    • In the meantime, some other processor may modify that value through a store and the store may commit (i.e. become globally visible) before the load commits: may violate SC (why?)
    • How do you detect such a violation?
    • How do you recover and guarantee an SC execution?
    • Any special consideration for prefetches?
      • Binding and non-binding prefetches
  • In MIPS R10000 a store remains at the head of the active list until it is completed in cache
    • Can we just remove it as soon as it issues and let the other instructions commit (the store can complete from store buffer at a later point)? How far can we go and still guarantee SC?
  • The Stanford DASH multiprocessor, on receiving a read reply that is already invalidated, forces the processor to retry that load
    • Why can’t it use the value in the cache line and then discard the line?
  • Does the cache controller need to take any special action when a line is replaced from the cache?