Module 12: "Multiprocessors on a Snoopy Bus"
  Lecture 26: "Case Studies"
 

Write atomicity and SC

  • Sequential consistency (SC) requires write atomicity i.e. total order of all writes seen by all processors should be identical
    • Since a BusRdX or BusUpgr does not wait until the invalidations are actually applied to the caches, you have to be careful

      P0: A=1; B=1;
      P1: print B; print A

    • Under SC (A, B) = (0, 1) is not allowed
    • Suppose to start with P1 has the line containing A in cache, but not the line containing B
    • The stores of P0 queue the invalidation of A in P1’s cache controller
    • P1 takes read miss for B, but the response of B is re-ordered by P1’s cache controller so that it overtakes the invalidaton (thought it may be better to prioritize reads)

Another example

P0: A=1; print B;

P1: B=1; print A;

  • Under SC (A, B) = (0, 0) is not allowed
  • Same problem if P0 executes both instructions first, then P1 executes the write of B (which let’s assume generates an upgrade so that it is marked complete as soon as the address arbitration phase finishes), then the upgrade completion is re-ordered with the pending invalidation of A
  • So, the reason these two cases fail is that the new values are made visible before older invalidations are applied
  • One solution is to have a strict FIFO queue between the bus controller and the cache hierarchy
  • But it is sufficient as long as replies do not overtake invalidations; otherwise the bus responses can be re-ordered without violating write atomicity and hence SC (e.g., if there are only read and write responses in the queue, it sometimes may make sense to prioritize read responses)

In-order response

  • In-order response can simplify quite a few things in the design
    • The fully associative request table can be replaced by a FIFO queue
    • Conflicting requests where one is a write can actually be allowed now (multiple reads were allowed even before although only the first one actually appears on the bus)
    • Consider a BusRdX followed by a BusRd from two different processors
    • With in-order response it is guaranteed that the BusRdX response will be granted the data bus before the BusRd response (which may not be true for ooo response and hence such a conflict is disallowed)
    • So when the cache controller generating the BusRdX sees the BusRd it only notes that it should source the line for this request after its own write is completed
  • The performance penalty may be huge
    • Essentially because of the memory
    • Consider a situation where three requests are pending to cache lines A, B, C in that order
    • A and B map to the same memory bank while C is in a different bank
    • Although the response for C may be ready long before that of B, it cannot get the bus