Module 8: Memory Consistency Models and Case Studies of Multi-core
  Lecture 16: Case Studies of Multi-core
 


POWER4 Die Photo

IBM POWER5

  • Carries on POWER4 to the next generation
    • Each core of the dual-core chip is 2-way SMT: 24% area growth per core
    • More than two threads not only add complexity, may not provide extra performance benefit; in fact, performance may degrade because of resource contention and cache thrashing unless all shared resources are scaled up accordingly (hits a complexity wall)
    • L3 cache is moved to the processor side so that L2 cache can directly talk to it: reduces bandwidth demand on the interconnect (L3 hits at least do not go on bus)
    • This change enabled POWER5 designers to scale to 64-processor systems (i.e. 32 chips with a total of 128 threads)
    • Bigger L2 and L3 caches: 1.875 MB L2, 36 MB L3
    • On-chip memory controller