Module 8: Memory Consistency Models and Case Studies of Multi-core
  Lecture 15: Memory Consistency Models and Case Studies of Multi-core
 


32-way: Ring Bus

POWER4 Caches

  • Private L1 instruction and data caches (on chip)
    • L1 icache : 64 KB/direct mapped/128 bytes line
    • L1 dcache : 32 KB/2-way associative/128 bytes line/LRU
    • No M state in L1 data cache (write through)
  • On-chip shared L2 (on-chip coherence point)
    • 1.5 MB/8-way associative/128 bytes line/pseudo LRU
    • For on-chip coherence, L2 tag is augmented with a two-bit sharer vector; used to invalidate L1 on other core's write
    • Three L2 controllers and each L2 controller has four local coherence units; each L2 controller handles roughly 512 KB of data divided into four SRAM partitions
    • For off-chip coherence, each L2 controller has four snoop engines; executes enhanced MESI with seven states