Module 10: "Design of Shared Memory Multiprocessors"
  Lecture 19: "Sequential Consistency and Cache Coherence Protocols"
 

OOO and SC

  • Consider a simple example (all are zero initially)

    P0: x=w+1; r=y+1;

    P1: y=2; w=y+1;

    • Suppose the load that reads w takes a miss and so w is not ready for a long time; therefore, x=w+1 cannot complete immediately; eventually w returns with value 3
    • Inside the microprocessor r=y+1 completes (but does not commit) before x=w+1 and gets the old value of y (possibly from cache); eventually instructions commit in order with x=4, r=1, y=2, w=3
    • So we have the following partial orders

      P0: x=w+1 < r=y+1 and P1: y=2 < w=y+1

      Cross-thread: w=y+1 < x=w+1 and r=y+1 < y=2

    • Combine these to get a contradictory total order
  • What went wrong?

SC example

  • Consider the following example

    P0: A=1; print B;

    P1: B=1; print A;

  • Possible outcomes for an SC machine
    • (A, B) = (0,1); interleaving: B=1; print A; A=1; print B
    • (A, B) = (1,0); interleaving: A=1; print B; B=1; print A
    • (A, B) = (1,1); interleaving: A=1; B=1; print A; print B
                                             A=1; B=1; print B; print A
    • (A, B) = (0,0) is impossible: read of A must occur before write of A and read of B must occur before write of B i.e. print A < A=1 and print B < B=1, but A=1 < print B and B=1 < print A; thus print B < B=1 < print A < A=1 < print B which implies print B < print B, a contradiction

Implementing SC

  • Two basic requirements
    • Memory operations issued by a processor must become visible to others in program order
    • Need to make sure that all processors see the same total order of memory operations: in the previous example for the (0,1) case both P0 and P1 should see the same interleaving: B=1; print A; A=1; print B
  • The tricky part is to make sure that writes become visible in the same order to all processors
    • Write atomicity: as if each write is an atomic operation
    • Otherwise, two processors may end up using different values (which may still be correct from the viewpoint of cache coherence, but will violate SC)

Write atomicity

  • Example (A=0, B=0 initially)

    P0: A=1;

    P1: while (!A); B=1;

    P2: while (!B); print A;

  • A correct execution on an SC machine should print A=1
    • A=0 will be printed only if write to A is not visible to P2, but clearly it is visible to P1 since it came out of the loop
    • Thus A=0 is possible if P1 sees the order A=1 < B=1 and P2 sees the order B=1 < A=1 i.e. from the viewpoint of the whole system the write A=1 was not “atomic”
    • Without write atomicity P2 may proceed to print 0 with a stale value from its cache

Summary of SC

  • Program order from each processor creates a partial order among memory operations
  • Interleaving of these partial orders defines a total order
  • Sequential consistency: one of many total orders
  • A multiprocessor is said to be SC if any execution on this machine is SC compliant
  • Sufficient but not necessary conditions for SC
    • Issue memory operation in program order
    • Every processor waits for write to complete before issuing the next operation
    • Every processor waits for read to complete and the write that affects the returned value to complete before issuing the next operation (important for write atomicity)