Module 12: "Multiprocessors on a Snoopy Bus"
  Lecture 24: "Write Serialization in a Simple Design"
 

Inherently non-atomic

  • Even though the bus is atomic, a complete protocol transaction involves quite a few steps which together forms a non-atomic transaction
    • Issuing processor request
    • Looking up cache tags
    • Arbitrating for bus
    • Snoop action in other cache controller
    • Refill in requesting cache controller at the end
  • Different requests from different processors may be in a different phase of a transaction
    • This makes a protocol transition inherently non-atomic
  • Consider an example
    • P0 and P1 have cache line C in shared state
    • Both proceed to write the line
    • Both cache controllers look up the tags, put a BusUpgr into the bus request queue, and start arbitrating for the bus
    • P1 gets the bus first and launches its BusUpgr
    • P0 observes the BusUpgr and now it must invalidate C in its cache and change the request type to BusRdX
    • So every cache controller needs to do an associative lookup of the snoop address against its pending request queue and depending on the request type take appropriate actions
  • One way to reason about the correctness is to introduce transient states
    • Possible to think of the last problem as the line C being in a transient S→M state
    • On observing a BusUpgr or BusRdX, this state transitions to I→M which is also transient
    • The line C goes to stable M state only after the transaction completes
    • These transient states are not really encoded in the state bits of a cache line because at any point in time there will be a small number of outstanding requests from a particular processor (today the maximum I know of is 16)
    • These states are really determined by the state of an outstanding line and the state of the cache controller

Write serialization

  • Atomic bus makes it rather easy, but optimizations are possible
    • Consider a processor write to a shared cache line
    • Is it safe to continue with the write and change the state to M even before the bus transaction is complete?
    • After the bus transaction is launched it is totally safe because the bus is atomic and hence the position of the write is committed in the total order; therefore no need to wait any further (note that the exact point in time when the other caches invalidate the line is not important)
    • If the processor decides to proceed even before the bus transaction is launched (very much possible in ooo execution), the cache controller must take the responsibility of squashing and re-executing offending instructions so that the total order is consistent across the system

Fetch deadlock

  • Just a fancy name for a pretty intuitive deadlock
    • Suppose P0’s cache controller is waiting to get the bus for launching a BusRdX to cache line A
    • P1 has a modified copy of cache line A
    • P1 has launched a BusRd to cache line B and awaiting completion
    • P0 has a modified copy of cache line B
    • If both keep on waiting without responding to snoop requests, the deadlock cycle is pretty obvious
    • So every controller must continue to respond to snoop requests while waiting for the bus for its own requests
    • Normally the cache controller is designed as two separate independent logic units, namely, the inbound unit (handles snoop requests) and the outbound unit (handles own requests and arbitrates for bus)