Objectives_template

	Handling writebacks Requires the evicting node to roll out Same for clean replacements also Dirty eviction (requiring a data transaction to home) can happen only from the head node Requires the head node to roll out Clean eviction can happen from any node in the list Does not require a transaction to home unless its state is ONLY_FRESH or HEAD_FRESH ONLY_FRESH eviction changes directory state from FRESH to HOME (i.e. no sharer) HEAD_FRESH eviction must update the head pointer in directory (directory state remains unchanged) Dirty eviction is completed first before initiating the miss generating the eviction Rationale is low complexity, and RAC eviction is rare Roll-out protocol Some details about the roll-out mechanism CASE A: rolling out from the middle of the list Request-acknowledgment protocol between the victim and its upstream and downstream neighbors If one of the neighbors is in PENDING state it can NACK the roll-out request; the requester must retry Problem arises when two adjacent nodes try to roll out simultaneously (nothing stops both nodes to replace the same cache line at the same time) Both will keep on NACKing each other leading to a livelock To break this cycle the node closer to tail is given priority (how do you know who is closer to tail?) Neighbors may need to change RAC state depending on situation (HEAD_DIRTY to ONLY_DIRTY or HEAD_FRESH to ONLY_FRESH) CASE B: Roll-out from head of the list Neighbor must update RAC state to reflect the fact that it is the new head Home also should be notified about the new head (directory state may not always change) Problem arises when the head change message reaching the home finds a totally new head already registered Means some other node is in the process of attaching itself to the head Home NACKs the roll-out Rolling out node remains in PENDING state and keeps on retrying until the request from the new would-be head arrives At this point the list goes back to stable state and the roll-out can complete Snoop interaction Interesting design problems arise due to limitations of the Pentium Pro quad The biggest problem is that the MESI protocol is designed for in-order response (so what?) Had to use the deferred response signal for remote requests Lesson learned: for hierarchical protocols bus must be split-transaction with out-of-order response (what happens otherwise?) Snoop response is available after four cycles earliest Stall wire may be asserted by any processor unable to meet this four-cycle limit Bus controller samples the stall wire every two cycles RAC and directory (for local requests) are also looked up in parallel Protocol processor NUMA-Q runs protocols in microcode The protocol processor is customized with bit-field operations and is a three-stage dual issue pipeline Has dedicated cache for holding recently accessed directory entries and RAC tags Protocol processor also contains three counters for monitoring performance These counters can be programmed through protocol code (i.e. read and written to)