Module 12: "Multiprocessors on a Snoopy Bus"
  Lecture 24: "Write Serialization in a Simple Design"
 

Livelock

  • Consider the following example
    • P0 and P1 try to write to the same cache line
    • P0 gets exclusive ownership, fills the line in cache and notifies the load/store unit (or retirement unit) to retry the store
    • While all these are happening P1’s request appears on the bus and P0’s cache controller modifies tag state to I before the store could retry
    • This can easily lead to a livelock
    • Normally this is avoided by giving the load/store unit higher priority for tag access (i.e. the snoop logic cannot modify the tag arrays when there is a processor access pending in the same clock cycle)
    • This is even rarer in multi-level cache hierarchy (more later)

Starvation

  • Some amount of fairness is necessary in the bus arbiter
    • An FCFS policy is possible for granting bus, but that needs some buffering in the arbiter to hold already placed requests
    • Most machines implement an aging scheme which keeps track of the number of times a particular request is denied and when the count crosses a threshold that request becomes the highest priority (this too needs some storage)

More on LL/SC

  • We have seen that both LL and SC may suffer from cache misses (a read followed by an upgrade miss)
  • Is it possible to save one transaction?
    • What if I design my cache controller in such a way that it can recognize LL instructions and launch a BusRdX instead of BusRd?
    • This is called Read-for-Ownership (RFO); also used by Intel atomic xchg instruction
    • Nice idea, but you have to be careful
    • By doing this you have just enormously increased the probability of a livelock: before the SC executes there is a high probability that another LL will take away the line
    • Possible solution is to buffer incoming snoop requests until the SC completes (buffer space is proportional to P); may introduce new deadlock cycles (especially for modern non-atomic busses)

Multi-level caches

  • We have talked about multi-level caches and the involved inclusion property
  • Multiprocessors create new problems related to multi-level caches
    • A bus snoop result may be relevant to inner levels of cache e.g., bus transactions are not visible to the first level cache controller
    • Similarly, modifications made in the first level cache may not be visible to the second level cache controller which is responsible for handling bus requests
  • Inclusion property makes it easier to maintain coherence
    • Since L1 cache is a subset of L2 cache a snoop miss in L2 cache need not be sent to L1 cache