|
Inherently non-atomic
- Even though the bus is atomic, a complete protocol transaction involves quite a few steps which together forms a non-atomic transaction
- Issuing processor request
- Looking up cache tags
- Arbitrating for bus
- Snoop action in other cache controller
- Refill in requesting cache controller at the end
- Different requests from different processors may be in a different phase of a transaction
- This makes a protocol transition inherently non-atomic
- Consider an example
- P0 and P1 have cache line C in shared state
- Both proceed to write the line
- Both cache controllers look up the tags, put a BusUpgr into the bus request queue, and start arbitrating for the bus
- P1 gets the bus first and launches its BusUpgr
- P0 observes the BusUpgr and now it must invalidate C in its cache and change the request type to BusRdX
- So every cache controller needs to do an associative lookup of the snoop address against its pending request queue and depending on the request type take appropriate actions
- One way to reason about the correctness is to introduce transient states
- Possible to think of the last problem as the line C being in a transient S→M state
- On observing a BusUpgr or BusRdX, this state transitions to I→M which is also transient
- The line C goes to stable M state only after the transaction completes
- These transient states are not really encoded in the state bits of a cache line because at any point in time there will be a small number of outstanding requests from a particular processor (today the maximum I know of is 16)
- These states are really determined by the state of an outstanding line and the state of the cache controller
Write serialization
- Atomic bus makes it rather easy, but optimizations are possible
- Consider a processor write to a shared cache line
- Is it safe to continue with the write and change the state to M even before the bus transaction is complete?
- After the bus transaction is launched it is totally safe because the bus is atomic and hence the position of the write is committed in the total order; therefore no need to wait any further (note that the exact point in time when the other caches invalidate the line is not important)
- If the processor decides to proceed even before the bus transaction is launched (very much possible in ooo execution), the cache controller must take the responsibility of squashing and re-executing offending instructions so that the total order is consistent across the system
Fetch deadlock
- Just a fancy name for a pretty intuitive deadlock
- Suppose P0’s cache controller is waiting to get the bus for launching a BusRdX to cache line A
- P1 has a modified copy of cache line A
- P1 has launched a BusRd to cache line B and awaiting completion
- P0 has a modified copy of cache line B
- If both keep on waiting without responding to snoop requests, the deadlock cycle is pretty obvious
- So every controller must continue to respond to snoop requests while waiting for the bus for its own requests
- Normally the cache controller is designed as two separate independent logic units, namely, the inbound unit (handles snoop requests) and the outbound unit (handles own requests and arbitrates for bus)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|