Module 18: "TLP on Chip: HT/SMT and CMP"
  Lecture 41: "Case Studies: Intel Montecito and Sun Niagara"
 

Sun Niagara OR Ultrasparc T1

Features

  • Eight pipelines or cores, each shared by 4 threads
    • 32-way multithreading on a single chip
    • Starting frequency of 1.2 GHz, consumes 60 W
    • Shared 3 MB L2 cache, 4-way banked, 12-way set associative, 200 GB/s bandwidth
    • Single-issue six stage pipe
    • Target market is web service where ILP is limited, but TLP is huge (independent transactions)
      • Throughput matters

Pipeline details

Reproduced from IEEE Micro
  • Four threads share a six-stage pipeline
    • Shared L1 caches and TLBs
    • Dedicated register file per thread
    • Fetches two instructions every cycle from a selected thread
    • Thread select logic also determines which thread’s instruction should be fed into the pipe
    • Although pipe is in-order, there is an 8-entry store buffer per thread (why?)
    • Threads may run into structural hazards due to limited number of FUs
      • Divider is granted to the least recently executed thread