Module 14: "Directory-based Cache Coherence"
  Lecture 33: "SCI Protocol"
 

Handling writebacks

  • Requires the evicting node to roll out
    • Same for clean replacements also
    • Dirty eviction (requiring a data transaction to home) can happen only from the head node
      • Requires the head node to roll out
    • Clean eviction can happen from any node in the list
      • Does not require a transaction to home unless its state is ONLY_FRESH or HEAD_FRESH
      • ONLY_FRESH eviction changes directory state from FRESH to HOME (i.e. no sharer)
      • HEAD_FRESH eviction must update the head pointer in directory (directory state remains unchanged)
    • Dirty eviction is completed first before initiating the miss generating the eviction
      • Rationale is low complexity, and RAC eviction is rare

Roll-out protocol

  • Some details about the roll-out mechanism
    • CASE A: rolling out from the middle of the list
      • Request-acknowledgment protocol between the victim and its upstream and downstream neighbors
      • If one of the neighbors is in PENDING state it can NACK the roll-out request; the requester must retry
      • Problem arises when two adjacent nodes try to roll out simultaneously (nothing stops both nodes to replace the same cache line at the same time)
        • Both will keep on NACKing each other leading to a livelock
        • To break this cycle the node closer to tail is given priority (how do you know who is closer to tail?)
      • Neighbors may need to change RAC state depending on situation (HEAD_DIRTY to ONLY_DIRTY or HEAD_FRESH to ONLY_FRESH)
    • CASE B: Roll-out from head of the list
      • Neighbor must update RAC state to reflect the fact that it is the new head
      • Home also should be notified about the new head (directory state may not always change)
      • Problem arises when the head change message reaching the home finds a totally new head already registered
        • Means some other node is in the process of attaching itself to the head
        • Home NACKs the roll-out
        • Rolling out node remains in PENDING state and keeps on retrying until the request from the new would-be head arrives
        • At this point the list goes back to stable state and the roll-out can complete

Snoop interaction

  • Interesting design problems arise due to limitations of the Pentium Pro quad
    • The biggest problem is that the MESI protocol is designed for in-order response (so what?)
    • Had to use the deferred response signal for remote requests
      • Lesson learned: for hierarchical protocols bus must be split-transaction with out-of-order response (what happens otherwise?)
    • Snoop response is available after four cycles earliest
      • Stall wire may be asserted by any processor unable to meet this four-cycle limit
      • Bus controller samples the stall wire every two cycles
    • RAC and directory (for local requests) are also looked up in parallel

Protocol processor

  • NUMA-Q runs protocols in microcode
    • The protocol processor is customized with bit-field operations and is a three-stage dual issue pipeline
    • Has dedicated cache for holding recently accessed directory entries and RAC tags
    • Protocol processor also contains three counters for monitoring performance
      • These counters can be programmed through protocol code (i.e. read and written to)