Module 12: "Multiprocessors on a Snoopy Bus"
  Lecture 24: "Write Serialization in a Simple Design"
 

Agenda

  • Goal is to understand what influences the performance, cost and scalability of SMPs
    • Details of physical design of SMPs
    • At least three goals of any design: correctness, performance, low hardware complexity
    • Performance gains are normally achieved by pipelining memory transactions and having multiple outstanding requests
    • These performance optimizations occasionally introduce new protocol races involving transient states leading to correctness issues in terms of coherence and consistency

Correctness goals

  • Must enforce coherence and write serialization
    • Recall that write serialization guarantees all writes to a location to be seen in the same order by all processors
  • Must obey the target memory consistency model
    • If sequential consistency is the goal, the system must provide write atomicity and detect write completion correctly (write atomicity extends the definition of write serialization for any location i.e. it guarantees that positions of writes within the total order seen by all processors be the same)
  • Must be free of deadlock, livelock and starvation
    • Starvation confined to a part of the system is not as problematic as deadlock and livelock
    • However, system-wide starvation leads to livelock

A simple design

  • Start with a rather naïve design
    • Each processor has a single level of data and instruction caches
    • The cache allows exactly one outstanding miss at a time i.e. a cache miss request is blocked if already another is outstanding (this serializes all bus requests from a particular processor)
    • The bus is atomic i.e. it handles one request at a time

Cache controller

  • Must be able to respond to bus transactions as necessary 1
    • Handled by the snoop logic
  • The snoop logic should have access to the cache tags
    • A single set of tags cannot allow concurrent accesses by the processor-side and the bus-side controllers
    • When the snoop logic accesses the tags the processor must remain locked out from accessing the tags
    • Possible enhancements: two read ports in the tag RAM allows concurrent reads; duplicate copies are also possible; multiple banks reduce the contention also
    • In all cases, updates to tags must still be atomic or must be applied to both copies in case of duplicate tags; however, tag updates are a lot less frequent compared to reads