Module 14: "Directory-based Cache Coherence"
  Lecture 32: "Protocol Occupancy and Directory Controllers"
 

Flexible protocol engine

  • Software protocol
    • Executes short sequences of instructions or micro-code known as protocol handlers on a processor
    • Each message type has a separate handler
    • Can make the protocol complicated
    • Allows late-binding of protocol, can choose appropriate protocol, easier verification path
    • Normally higher occupancy than hardwired controllers if controller clock is slow
    • Protocol processor may use separate protocol data and code caches to speed up protocol processing
  • Four existing designs
    • Customized coprocessor embedded in memory controller
      • ISA designed to include bit field operations: helpful for directory manipulation (bit clear, bit set, branch on bit clear, branch on bit set, find first set bit, etc.)
      • Processor is normally simple e.g. short pipeline, in-order, no fp unit or mult/div
      • Example: Stanford FLASH, Sun S3.mp, Alpha Piranha CMP, Sequent STiNG, Sequent NUMA-Q
  • Four existing designs
    • General purpose processor embedded in memory controller
      • Uses commodity processor cores
      • May be wasteful of resources
      • Normally higher occupancy than customized coprocessor if memory clock is slow
      • Example: Wisconsin Typhoon
  • Four existing designs
    • Execute on main processor
      • Interrupt the main processor to execute coherence protocol on cache miss or network message arrival
      • Needs an extremely low overhead interrupt mechanism to be competitive
      • Grahn and Stenstrom (1995)
  • Four existing designs
    • Execute on spare hardware thread context of multi-threaded (or hyper-threaded) processors
      • No interrupt overhead
      • Reserve a protocol thread context
      • Application and protocol threads co-exist in the processor (no context switch needed)
      • Chaudhuri and Heinrich (2004)
      • Can’t discuss in detail before talking about SMT/HT
  • Possible future design
    • Devote a core to protocol processing in multi-core architectures (Kalamkar, Chaudhuri, and Heinrich, 2007)
    • Increasingly attractive as number of cores increases