Module 18: "TLP on Chip: HT/SMT and CMP"
  Lecture 41: "Case Studies: Intel Montecito and Sun Niagara"
 

Intel Montecito

Features

  • Dual core Itanium 2, each core dual threaded
  • 1.7 billion transistors, 21.5 mm x 27.7 mm die
  • 27 MB of on-chip three levels of cache
    • Not shared among cores
  • 1.8+ GHz, 100 W
  • Single-thread enhancements
    • Extra shifter improves performance of crypto codes by 100%
    • Improved branch prediction
    • Improved data and control speculation recovery
    • Separate L2 instruction and data caches buys 7% improvement over Itanium2; four times bigger L2I (1 MB)
    • Asynchronous 12 MB L3 cache

Overview

Reproduced from IEEE Micro

Dual threads

  • SMT only for cache, not for core resources
    • Simulations showed high resource utilization at core level, but low utilization of cache
    • Branch predictor is still shared but use thread id tags
    • Thread switch is implemented by flushing the pipe
      • More like coarse-grain multithreading
    • Five thread switch events
      • L3 cache miss (immense impact on in-order pipe)/ L3 cache refill
      • Quantum expiry
      • Spin lock/ ALAT invalidation
      • Software-directed switch
      • Execution in low power mode