|
IBM POWER5
IBM POWER5
- Carries on POWER4 to the next generation
- Each core of the dual-core chip is 2-way SMT: 24% area growth per core
- More than two threads not only add complexity, may not provide extra performance benefit; in fact, performance may degrade because of resource contention and cache thrashing unless all shared resources are scaled up accordingly (hits a complexity wall)
- L3 cache is moved to the processor side so that L2 cache can directly talk to it: reduces bandwidth demand on the interconnect (L3 hits at least do not go on bus)
- This change enabled POWER5 designers to scale to 64-processor systems (i.e. 32 chips with a total of 128 threads)
- Bigger L2 and L3 caches: 1.875 MB L2, 36 MB L3
- On-chip memory controller
Reproduced from IEEE Micro |
- Same pipeline structure as POWER4
- Added SMT facility
- Like Pentium 4, fetches from each thread in alternate cycles (8-instruction fetch per cycle just like POWER4)
- Threads share ITLB and ICache
- Increased size of register file compared to POWER4 to support two threads: 120 integer and floating-point registers (POWER4 has 80 integer and 72 floating-point registers): improves single-thread performance compared to POWER4; smaller technology (0.13 μm) made it possible to access a bigger register file in same or shorter time leading to same pipeline as POWER4
- Doubled associativity of L1 caches to reduce conflict misses: icache is 2-way and dcache is 4-way
Reproduced from IEEE Micro |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|