Module 18: "TLP on Chip: HT/SMT and CMP"
  Lecture 41: "Case Studies: Intel Montecito and Sun Niagara"
 

Cache hierarchy

  • L1 instruction cache
    • 16 KB / 4-way / 32 bytes / random replacement
    • Fetches two instructions every cycle
    • If both instructions are useful, next cycle is free for icache refill
  • L1 data cache
    • 8 KB / 4-way / 16 bytes/ write-through, no-allocate
    • On avearge 10% miss rate for target benchmarks
    • L2 cache extends the tag to maintain a directory for keeping the core L1s coherent
  • L2 cache is writeback with silent clean eviction

Thread selection

  • Based on long latency events such as load, divide, multiply, branch
  • Also based on pipeline stalls due to cache misses, traps, or structural hazards
  • Speculative load dependent issue with low priority