Module 18: "TLP on Chip: HT/SMT and CMP"
  Lecture 40: "Case Studies: IBM Power4 and IBM Power5"
 

POWER4 caches

  • Private L1 instruction and data caches (on chip)
    • L1 icache: 64 KB/direct mapped/128 bytes line
    • L1 dcache: 32 KB/2-way associative/128 bytes line/LRU
    • No M state in L1 data cache (write through)
  • On-chip shared L2 (on-chip coherence point)
    • 1.5 MB/8-way associative/128 bytes line/pseudo LRU
    • For on-chip coherence, L2 tag is augmented with a two-bit sharer vector; used to invalidate L1 on other core’s write
    • Three L2 controllers and each L2 controller has four local coherence units; each L2 controller handles roughly 512 KB of data divided into four SRAM partitions
    • For off-chip coherence, each L2 controller has four snoop engines; executes enhanced MESI with seven states

POWER4 L2 cache