Module 18: "TLP on Chip: HT/SMT and CMP"
  Lecture 40: "Case Studies: IBM Power4 and IBM Power5"
 

IBM POWER5

IBM POWER5

  • Carries on POWER4 to the next generation
    • Each core of the dual-core chip is 2-way SMT: 24% area growth per core
    • More than two threads not only add complexity, may not provide extra performance benefit; in fact, performance may degrade because of resource contention and cache thrashing unless all shared resources are scaled up accordingly (hits a complexity wall)
    • L3 cache is moved to the processor side so that L2 cache can directly talk to it: reduces bandwidth demand on the interconnect (L3 hits at least do not go on bus)
    • This change enabled POWER5 designers to scale to 64-processor systems (i.e. 32 chips with a total of 128 threads)
    • Bigger L2 and L3 caches: 1.875 MB L2, 36 MB L3
    • On-chip memory controller

    Reproduced from IEEE Micro
  • Same pipeline structure as POWER4
    • Added SMT facility
    • Like Pentium 4, fetches from each thread in alternate cycles (8-instruction fetch per cycle just like POWER4)
    • Threads share ITLB and ICache
    • Increased size of register file compared to POWER4 to support two threads: 120 integer and floating-point registers (POWER4 has 80 integer and 72 floating-point registers): improves single-thread performance compared to POWER4; smaller technology (0.13 μm) made it possible to access a bigger register file in same or shorter time leading to same pipeline as POWER4
    • Doubled associativity of L1 caches to reduce conflict misses: icache is 2-way and dcache is 4-way
Reproduced from IEEE Micro