Module 9: "Introduction to Shared Memory Multiprocessors"
  Lecture 16: "Multiprocessor Organizations and Cache Coherence"
 

What went wrong?

  • For write through cache
    • The memory value may be correct if the writes are correctly ordered
    • But the system allowed a store to proceed when there is already a cached copy
    • Lesson learned: must invalidate all cached copies before allowing a store to proceed
  • Writeback cache
    • Problem is even more complicated: stores are no longer visible to memory immediately
    • Writeback order is important
    • Lesson learned: do not allow more than one copy of a cache line in M state

Implementations

  • Must invalidate all cached copies before allowing a store to proceed
    • Need to know where the cached copies are
    • Solution1: Never mind! Just tell everyone that you are going to do a store
      • Leads to broadcast snoopy protocols
      • Popular with small-scale bus-based CMPs and SMPs
      • AMD Opteron implements it on a distributed network (the Hammer protocol)
      • The biggest reason why quotidian Windows fans would buy small-scale multiprocessors and multi-core today
    • Solution2: Keep track of the sharers and invalidate them when needed
      • Where and how is this information stored?
      • Leads to directory-based scalable protocols
  • Directory-based protocols
    • Maintain one directory entry per memory block
    • Each directory entry contains a sharer bitvector and state bits
    • Concept of home node in distributed shared memory multiprocessors
    • Concept of sparse directory for on-chip coherence in CMPs
  • Do not allow more than one copy of a cache line in M state
    • Need some form of access control mechanism
    • Before a processor does a store it must take “permission” from the current “owner” (if any)
    • Need to know who the current owner is
      • Either a processor or main memory
    • Solution1 and Solution2 apply here also
  • Latest value must be propagated to the requester
    • Notion of “latest” is very fuzzy
    • Once we know the owner, this is easy
    • Solution1 and Solution2 apply here also
  • Invariant: if a cache block is not in M state in any processor, memory must provide the block to the requester
    • Memory must be updated when a block transitions from M state to S state
    • Note that a transition from M to I always updates memory in systems with writeback caches (these are normal writeback operations)
  • Most of the implementations of a coherence protocol deals with uncommon cases and races