Module 12: "Multiprocessors on a Snoopy Bus"
  Lecture 26: "Case Studies"
 

Multi-level caches

  • Split-transaction bus makes the design of multi-level caches a little more difficult
    • The usual design is to have queues between levels of caches in each direction
    • How do you size the queues? Between processor and L1 one buffer is sufficient (assume one outstanding processor access), L1-to-L2 needs P+1 buffers (why?), L2-to-L1 needs P buffers (why?), L1 to processor needs one buffer
    • With smaller buffers there is a possibility of deadlock: suppose the L1-to-L2 and L2-to-L1 have one queue entry each, there is a request in L1-to-L2 queue and there is also an intervention in L2-to-L1 queue; clearly L1 cannot pick up the intervention because it does not have space to put the reply in L1-to-L2 queue while L2 cannot pick up the request because it might need space in L2-to-L1 queue in case of an L2 hit
  • Formalizing the deadlock with dependence graph
    • There are four types of transactions in the cache hierarchy: 1. Processor requests (outbound requests),   2. Responses to processor requests (inbound responses), 3. Interventions (inbound requests), 4. Intervention responses (outbound responses)
    • Processor requests need space in L1-to-L2 queue; responses to processors need space in L2-to-L1 queue; interventions need space in L2-to-L1 queue; intervention responses need space in L1-to-L2 queue
    • Thus a message in L1-to-L2 queue may need space in L2-to-L1 queue (e.g. a processor request generating a response due to L2 hit); also a message in L2-to-L1 queue may need space in L1-to-L2 queue (e.g. an intervention response)
    • This creates a cycle in queue space dependence graph

Dependence graph

  • Represent a queue by a vertex in the graph
    • Number of vertices = number of queues
  • A directed edge from vertex u to vertex v is present if a message at the head of queue u may generate another message which requires space in queue v
  • In our case we have two queues
    • L2-L1 and L1-L2; the graph is not a DAG, hence deadlock

 

Multi-level caches

  • In summary
    • L2 cache controller refuses to drain L1-to-L2 queue if there is no space in L2-to-L1 queue; this is rather conservative because the message at the head of L1-to-L2 queue may not need space in L2-to-L1 queue e.g., in case of L2 miss or if it is an intervention reply; but after popping the head of L1-to-L2 queue it is impossible to backtrack if the message does need space in L2-to-L1 queue
    • Similarly, L1 cache controller refuses to drain L2-to-L1 queue if there is no space in L1-to-L2 queue
    • How do we break this cycle?
    • Observe that responses for processor requests are guaranteed not to generate any more messages and intervention requests do not generate new requests, but can only generate replies
  • Solving the queue deadlock
    • Introduce one more queue in each direction i.e. have a pair of queues in each direction
    • L1-to-L2 processor request queue and L1-to-L2 intervention response queue
    • Similarly, L2-to-L1 intervention request queue and L2-to-L1 processor response queue
    • Now L2 cache controller can serve L1-to-L2 processor request queue as long as there is space in L2-to-L1 processor response queue, but there is no constraint on L1 cache controller for draining L2-to-L1 processor response queue
    • Similarly, L1 cache controller can serve L2-to-L1 intervention request queue as long as there is space in L1-to-L2 intervention response queue, but L1-to-L2 intervention response queue will drain as soon as bus is granted