Technology trends
- The natural building block for multiprocessors is microprocessor
- Microprocessor performance increases 50% every year
- Transistor count doubles every 18 months
- Intel Pentium 4 EE 3.4 GHz has 178 M transistors on a 237 mm2 die
- 130 nm Itanium 2 has 410 M transistors on a 374 mm2 die
- 90 nm Intel Montecito has 1.7 B transistors on a 596 mm2 die
- Die area is also growing
- Intel Prescott had 125 M transistors on a 112 mm2 die
- Ever-shrinking process technology
- Shorter gate length of transistors
- Can afford to sweep electrons through channel faster
- Transistors can be clocked at faster rate
- Transistors also get smaller
- Can afford to pack more on the die
- And die size is also increasing
- What to do with so many transistors?
- Could increase L2 or L3 cache size
- Does not help much beyond a certain point
- Burns more power
- Could improve microarchitecture
- Better branch predictor or novel designs to improve instruction-level parallelism (ILP)
- If cannot improve single-thread performance have to look for thread-level parallelism (TLP)
- Multiple cores on the die (chip multiprocessors): IBM POWER4, POWER5, Intel Montecito, Intel Pentium 4, AMD Opteron, Sun UltraSPARC IV
- TLP on chip
- Instead of putting multiple cores could put extra resources and logic to run multiple threads simultaneously (simultaneous multi-threading): Alpha 21464 (cancelled), Intel Pentium 4, IBM POWER5, Intel Montecito
- Today’s microprocessors are small-scale multiprocessors (dual-core, 2-way SMT)
- Tomorrow’s microprocessors will be larger-scale multiprocessors or highly multi-threaded
- Sun Niagara is an 8-core (each 4-way threaded) chip: 32 threads on a single chip
Architectural trends
- Circuits: bit-level parallelism
- Started with 4 bits (Intel 4004) [http://www.intel4004.com/]
- Now 32-bit processor is the norm
- 64-bit processors are taking over (AMD Opteron, Intel Itanium, Pentium 4 family); started with Alpha, MIPS, Sun families
- Architecture: instruction-level parallelism (ILP)
- Extract independent instruction stream
- Key to advanced microprocessor design
- Gradually hitting a limit: memory wall
- Memory operations are bottleneck
- Need memory-level parallelism (MLP)
- Also technology limits such as wire delay are pushing for a more distributed control rather than the centralized control in today’s processors
- If cannot boost ILP what can be done?
- Thread-level parallelism (TLP)
- Explicit parallel programs already have TLP (inherent)
- Sequential programs that are hard to parallelize or ILP-limited can be speculatively parallelized in hardware
- Thread-level speculation (TLS)
- Today’s trend: if cannot do anything to boost single-thread performance invest transistors and resources to exploit TLP