Module 14: "Directory-based Cache Coherence"
  Lecture 32: "Protocol Occupancy and Directory Controllers"
 

Virtual network: Case Studies

  • Each virtual network consists of an NI queue in each direction connected to the corresponding queue or group of queues in the router
  • SGI Origin 2000
    • Two virtual networks; uses back-off intervention and invalidation to avoid cycles in the network dependence graph
  • Stanford DASH
    • Two virtual networks; in case an incoming request needs space in outgoing request network and outgoing request queue is full, it waits for a pre-defined number of cycles and then if still full, sends a NACK to the requester
  • AlphaServer GS320
    • Three virtual networks; longest transaction is 3-hop
  • Stanford FLASH
    • Four virtual networks; longest transaction is 4-hop (special case of reply generating a reply)
  • Alpha 21364 router
    • 19 virtual channels (essentially queues) in each direction per port: 3 channels per virtual network, six coherence message types, one extra channel forms the seventh virtual network to carry some special coherence control messages (3 channels within a network are used for adaptive routing)

Coherence controller occupancy

  • How long does it take to service a message on average?
    • If you imagine the coherence controller as a centralized server in a queuing model, occupancy is just the reciprocal of service rate
    • Occupancy of servicing a message induces a waiting time on the subsequent messages (shows up as a contention component in the total end-to-end latency)
      • Queuing analysis and simulation show that contention grows faster than quadratic in occupancy (Chaudhuri et al, 2003); later empirically confirmed by other researchers that it is likely to be sub-cubic
      • Goal should be to design low-occupancy protocols

Protocol occupancy

  • Goal is to design low-occupancy protocol
    • Doesn’t mean cannot do smart things
    • A high-occupancy protocol can still perform well if it can reduce the message count accordingly
    • Latency tolerating techniques such as prefetching usually puts more pressure on the coherence controller (why?)
      • Leads to an increased average protocol occupancy
    • Some bad protocol decisions
      • Invalidation acknowledgments at home
      • Replacement hints
      • NACKs
    • Final design is usually influenced by directory organization and coherence controller microarchitecture

Directory controllers

  • Two main designs
    • Hardwired finite state machines (fixed protocol)
    • Software protocol running on embedded protocol processor in memory controller (suited for off-chip memory controllers) or protocol thread in main processor (suited for multi-threaded processors) or protocol core in main processor (suited for multi-core processors)
  • Hardwired FSM
    • Low occupancy (all-hardware)
    • Protocol must be simple enough to be able to design and verify in hardware
    • Possible to pipeline various stages of protocol processing
    • Cannot afford late-binding or flexibility in the choice of protocol
    • SGI Origin 2000, MIT Alewife, Stanford DASH