Module 10: "Design of Shared Memory Multiprocessors"
  Lecture 20: "Performance of Coherence Protocols"
 

MOESI protocol

  • Some SMPs implement MOESI today e.g., AMD Athlon MP and the IBM servers
  • Why is the O state needed?
    • O state is very similar to E state with four differences:  1. If a cache line is in O state in some cache, that cache is responsible for sourcing the line to the next requester; 2. The memory may not have the most up-to-date copy of the line (this implies 1); 3. Eviction of a line in O state generates a BusWB; 4. Write to a line in O state must generate a bus transaction
    • When a line transitions from M to S it is necessary to write the line back to memory
    • For a migratory sharing pattern (frequent in database workloads) this leads to a series of writebacks to memory
    • These writebacks just keep the memory banks busy and consumes memory bandwidth
  • Take the following example
    • P0 reads x, P0 writes x, P1 reads x, P1 writes x, P2 reads x, P2 writes x, …
    • Thus at the time of a BusRd response the memory will write the line back: one writeback per processor handover
    • O state aims at eliminating all these writebacks by transitioning from M to O instead of M to S on a BusRd/Flush
    • Subsequent BusRd requests are replied by the owner holding the line in O state
    • The line is written back only when the owner evicts it: one single writeback
  • State transitions pertaining to O state
    • I to O: not possible (or maybe; see below)
    • E to O or S to O: not possible
    • M to O: on a BusRd/Flush (but no memory writeback)
    • O to I: on CacheEvict/BusWB or {BusRdX,BusUpgr}/Flush
    • O to S: not possible (or maybe; see below)
    • O to E: not possible (or maybe if silent eviction not allowed)
    • O to M: on PrWr/BusUpgr
  • At most one cache can have a line in O state at any point in time
  • Two main design choices for MOESI
    • Consider the example P0 reads x, P0 writes x, P1 reads x, P2 reads x, P3 reads x, …
    • When P1 launches BusRd, P0 sources the line and now the protocol has two options: 1. The line in P0 goes to O and the line in P1 is filled in state S; 2. The line in P0 goes to S and the line in P1 is filled in state O i.e. P1 inherits ownership from P0
    • For bus-based SMPs the two choices will yield roughly the same performance
    • For DSM multiprocessors we will revisit this issue if time permits
    • According to the second choice, when P2 generates a BusRd request, P1 sources the line and transitions from O to S; P2 becomes the new owner
  • Some SMPs do not support the E state
    • In many cases it is not helpful, only complicates the protocol
    • MOSI allows a compact state encoding in 2 bits
    • Sun WildFire uses MOSI protocol

Dragon protocol

  • An update-based protocol for writeback caches
  • Four states: Two of them are standard E and M
    • Shared clean (Sc): The standard S state
    • Shared modified (Sm): This is really the O state
  • In fact, five states because you always have I i.e. not in cache
  • So really a MOESI update-based protocol
  • New bus transaction: BusUpd
    • Used to update part of cache line
  • Distinguish between cache hits and misses:
    • PrRd and PrWr are hits, PrRdMiss and PrWrMiss are misses