Module 14: "Directory-based Cache Coherence"
  Lecture 33: "SCI Protocol"
 

Sequent NUMA-Q

  • Implements the IEEE SCI directory protocol
    • One node is an Intel Pentium Pro quad SMP
    • The IQ-Link board connects to the system bus and implements the directory protocol
      • Also contains a 32 MB 4-way set associative RAC
    • Processors within a node are kept coherent via a MESI snoop-based protocol already implemented in Pentium Pro quad
    • The SCI protocol keeps the RACs coherent across nodes
    • The RAC maintains inclusion with the processor caches

SCI protocol

  • Directory structure
    • Home contains the id of the most recently queued sharer or the owner (6 bits)
  • Sharing list
    • A sharer contains the id of the next sharer and the previous sharer
    • The last sharer contains the id of home node and previous sharer
    • A circular doubly linked list
  • Three major states in directory
    • Home: remotely unowned, but may be in local quad
    • Fresh: same as shared
    • Gone: some node has exclusive ownership; memory stale 
  • Cache states
    • Processor cache: MESI
    • RAC: 29 stable states and many transient states
      • 7 bits for representing RAC state
      • Two-part naming of RAC state: first part says the location of the block in the list (ONLY, HEAD, TAIL, MID), second part mentions the actual state (modified, exclusive, fresh, copy, …)
      • We will use some of these to understand the basics of SCI (full description available from IEEE standards)
        • HEAD_DIRTY, TAIL_CLEAN, etc
  • Three major operations on the list
    • List construction: involves adding a new sharer to the list
    • Rollout: remove a sharer from the list; must synchronize with immediate neighbors
    • Purge/invalidate: head node always has write permission and so it can purge the entire list before writing; naturally, only the head node has the privilege of doing this
  • Three classes of protocol
    • Minimal SCI: sharing not allowed
    • Typical SCI (will discuss this): all supports that a normal human being can imagine
    • Full SCI: lot of optimizations including hardware support for synchronization