Objectives_template

	Inherently non-atomic Even though the bus is atomic, a complete protocol transaction involves quite a few steps which together forms a non-atomic transaction Issuing processor request Looking up cache tags Arbitrating for bus Snoop action in other cache controller Refill in requesting cache controller at the end Different requests from different processors may be in a different phase of a transaction This makes a protocol transition inherently non-atomic Consider an example P0 and P1 have cache line C in shared state Both proceed to write the line Both cache controllers look up the tags, put a BusUpgr into the bus request queue, and start arbitrating for the bus P1 gets the bus first and launches its BusUpgr P0 observes the BusUpgr and now it must invalidate C in its cache and change the request type to BusRdX So every cache controller needs to do an associative lookup of the snoop address against its pending request queue and depending on the request type take appropriate actions One way to reason about the correctness is to introduce transient states Possible to think of the last problem as the line C being in a transient S￫M state On observing a BusUpgr or BusRdX, this state transitions to I￫M which is also transient The line C goes to stable M state only after the transaction completes These transient states are not really encoded in the state bits of a cache line because at any point in time there will be a small number of outstanding requests from a particular processor (today the maximum I know of is 16) These states are really determined by the state of an outstanding line and the state of the cache controller Write serialization Atomic bus makes it rather easy, but optimizations are possible Consider a processor write to a shared cache line Is it safe to continue with the write and change the state to M even before the bus transaction is complete? After the bus transaction is launched it is totally safe because the bus is atomic and hence the position of the write is committed in the total order; therefore no need to wait any further (note that the exact point in time when the other caches invalidate the line is not important) If the processor decides to proceed even before the bus transaction is launched (very much possible in ooo execution), the cache controller must take the responsibility of squashing and re-executing offending instructions so that the total order is consistent across the system Fetch deadlock Just a fancy name for a pretty intuitive deadlock Suppose P0’s cache controller is waiting to get the bus for launching a BusRdX to cache line A P1 has a modified copy of cache line A P1 has launched a BusRd to cache line B and awaiting completion P0 has a modified copy of cache line B If both keep on waiting without responding to snoop requests, the deadlock cycle is pretty obvious So every controller must continue to respond to snoop requests while waiting for the bus for its own requests Normally the cache controller is designed as two separate independent logic units, namely, the inbound unit (handles snoop requests) and the outbound unit (handles own requests and arbitrates for bus)