Objectives_template

Module 18: "TLP on Chip: HT/SMT and CMP"

Lecture 40: "Case Studies: IBM Power4 and IBM Power5"

POWER4 core

8-wide fetch, 8-wide issue, 5-wide commit
- Features out-of-order issue with renaming and branch prediction (bimodal+gshare hybrid)
- Allows 20 groups of at most 5 instructions each to be in-flight beyond dispatch (100 instructions)

POWER4 pipeline

Relatively short pipe
- Clocked at more than 1 GHz for 0.18μm technology
- Minimum 15 cycles for integer instructions
- Minimum 12-cycle branch misprediction penalty
- 11 small parallel issue queues (divided into four groups) for fast selection
- Back-to-back issue of dependent instructions not allowed: slow bypass or bypass absent? Requires at least one cycle gap
- Out-of-order load issue, load-load and load-store replay; load-load replay optimized with load queue snoop bit
- Write through write no allocate private L1 data cache; at most 8 outstanding L1 load misses
- Inclusion maintained between L2 and L1