Module 14: "Directory-based Cache Coherence"
  Lecture 33: "SCI Protocol"
 

Directory overhead

  • Directory overhead
    • Need 6 bits to maintain the head node id
      • NUMA-Q scales up to 64 nodes
    • Need 2 bits for encoding three states: HOME, FRESH, GONE
    • A system with P nodes, M bytes of memory, and cache block size of B bytes has M/B cache blocks per node
      • 2 + log(P) bits needed for directory entry per cache block
      • Total overhead = (M/B)*(log (P) + O(1))*P
    • O(P*log(P))

Cache overhead

  • Extended RAC tags for storing upstream and downstream pointers
    • 2*log(P) per cache block
    • Total increased tag DRAM area is O(P*log(P))

Handling read miss

  • Requester on missing the RAC as well as quad snoop sends a read request to home
    • Allocates a block in RAC and marks its state PENDING
    • CASE A: directory is HOME state
      • Change directory state to FRESH
      • Change head pointer to requester id
      • Send reply to requester
      • Requester fills cache block in RAC, forwards it to requesting processor, changes RAC block state to ONLY_FRESH
    • CASE B: directory state is FRESH
      • Home changes head pointer to requester id
      • Sends reply with data read from memory and the old head node id
      • Requester sends a request to the previous head expressing intention to become the new head
      • Old head changes its upstream pointer to point to the requester and the RAC state to MID_VALID or TAIL_VALID; sends an acknowledgment to requester
      • Requester changes its downstream pointer to old head and upstream pointer to home; also changes RAC line state to HEAD_FRESH
      • Observe the strict request-reply nature of the protocol
    • CASE C: directory state is GONE
      • Means head node has an exclusive copy of the cache line
      • Home replies to the requester with the head node id, but does not change the state of the directory
      • Requester sets RAC line state to PENDING and sends a data request to the head node
      • Old head changes RAC line state to TAIL_VALID, sets its upstream pointer to the requester, and sends data to requester
      • Requester sets RAC line state to HEAD_DIRTY, sets its upstream pointer to home and downstream pointer to old head
      • Note that directory remains in GONE state and memory is not updated (similar to an M to O transition)
  • Handling races
    • Suppose when the requester’s (say A) message reaches the old head (say B) the RAC line is in PENDING state
    • SCI doesn’t have any pending state in directory or doesn’t use NACKs (actually uses, but small in number)
    • B does become the new head (has to because the home has already updated the directory), but inherits the PENDING state from A
    • Any subsequent request will come to B and will become the new pending head
    • Ultimately the PENDING state is resolved along the chain starting from A upstream
    • FIFO nature of the pending list guarantees fairness
    • Also, no problem related to sizing the buffers for holding pending requests (no extra space needed