|
Handling writebacks
- Requires the evicting node to roll out
- Same for clean replacements also
- Dirty eviction (requiring a data transaction to home) can happen only from the head node
- Requires the head node to roll out
- Clean eviction can happen from any node in the list
- Does not require a transaction to home unless its state is ONLY_FRESH or HEAD_FRESH
- ONLY_FRESH eviction changes directory state from FRESH to HOME (i.e. no sharer)
- HEAD_FRESH eviction must update the head pointer in directory (directory state remains unchanged)
- Dirty eviction is completed first before initiating the miss generating the eviction
- Rationale is low complexity, and RAC eviction is rare
Roll-out protocol
- Some details about the roll-out mechanism
- CASE A: rolling out from the middle of the list
- Request-acknowledgment protocol between the victim and its upstream and downstream neighbors
- If one of the neighbors is in PENDING state it can NACK the roll-out request; the requester must retry
- Problem arises when two adjacent nodes try to roll out simultaneously (nothing stops both nodes to replace the same cache line at the same time)
- Both will keep on NACKing each other leading to a livelock
- To break this cycle the node closer to tail is given priority (how do you know who is closer to tail?)
- Neighbors may need to change RAC state depending on situation (HEAD_DIRTY to ONLY_DIRTY or HEAD_FRESH to ONLY_FRESH)
- CASE B: Roll-out from head of the list
- Neighbor must update RAC state to reflect the fact that it is the new head
- Home also should be notified about the new head (directory state may not always change)
- Problem arises when the head change message reaching the home finds a totally new head already registered
- Means some other node is in the process of attaching itself to the head
- Home NACKs the roll-out
- Rolling out node remains in PENDING state and keeps on retrying until the request from the new would-be head arrives
- At this point the list goes back to stable state and the roll-out can complete
Snoop interaction
- Interesting design problems arise due to limitations of the Pentium Pro quad
- The biggest problem is that the MESI protocol is designed for in-order response (so what?)
- Had to use the deferred response signal for remote requests
- Lesson learned: for hierarchical protocols bus must be split-transaction with out-of-order response (what happens otherwise?)
- Snoop response is available after four cycles earliest
- Stall wire may be asserted by any processor unable to meet this four-cycle limit
- Bus controller samples the stall wire every two cycles
- RAC and directory (for local requests) are also looked up in parallel
Protocol processor
- NUMA-Q runs protocols in microcode
- The protocol processor is customized with bit-field operations and is a three-stage dual issue pipeline
- Has dedicated cache for holding recently accessed directory entries and RAC tags
- Protocol processor also contains three counters for monitoring performance
- These counters can be programmed through protocol code (i.e. read and written to)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|