Module 2: "Parallel Computer Architecture: Today and Tomorrow"
  Lecture 4: "Shared Memory Multiprocessors"
 

Technology trends

  • The natural building block for multiprocessors is microprocessor
  • Microprocessor performance increases 50% every year
  • Transistor count doubles every 18 months
    • Intel Pentium 4 EE 3.4 GHz has 178 M transistors on a 237 mm2 die
    • 130 nm Itanium 2 has 410 M transistors on a 374 mm2 die
    • 90 nm Intel Montecito has 1.7 B transistors on a 596 mm2 die
  • Die area is also growing
    • Intel Prescott had 125 M transistors on a 112 mm2 die
  • Ever-shrinking process technology
    • Shorter gate length of transistors
    • Can afford to sweep electrons through channel faster
    • Transistors can be clocked at faster rate
    • Transistors also get smaller
    • Can afford to pack more on the die
    • And die size is also increasing
    • What to do with so many transistors?
  • Could increase L2 or L3 cache size
    • Does not help much beyond a certain point
    • Burns more power
  • Could improve microarchitecture
    • Better branch predictor or novel designs to improve instruction-level parallelism (ILP)
  • If cannot improve single-thread performance have to look for thread-level parallelism (TLP)
    • Multiple cores on the die (chip multiprocessors): IBM POWER4, POWER5, Intel Montecito, Intel Pentium 4, AMD Opteron, Sun UltraSPARC IV
  • TLP on chip
    • Instead of putting multiple cores could put extra resources and logic to run multiple threads simultaneously (simultaneous multi-threading): Alpha 21464 (cancelled), Intel Pentium 4, IBM POWER5, Intel Montecito
  • Today’s microprocessors are small-scale multiprocessors (dual-core, 2-way SMT)
  • Tomorrow’s microprocessors will be larger-scale multiprocessors or highly multi-threaded
    • Sun Niagara is an 8-core (each 4-way threaded) chip: 32 threads on a single chip

Architectural trends

  • Circuits: bit-level parallelism
    • Started with 4 bits (Intel 4004) [http://www.intel4004.com/]
    • Now 32-bit processor is the norm
    • 64-bit processors are taking over (AMD Opteron, Intel Itanium, Pentium 4 family); started with Alpha, MIPS, Sun families
  • Architecture: instruction-level parallelism (ILP)
    • Extract independent instruction stream
    • Key to advanced microprocessor design
    • Gradually hitting a limit: memory wall
    • Memory operations are bottleneck
    • Need memory-level parallelism (MLP)
    • Also technology limits such as wire delay are pushing for a more distributed control rather than the centralized control in today’s processors
  • If cannot boost ILP what can be done?
  • Thread-level parallelism (TLP)
    • Explicit parallel programs already have TLP (inherent)
    • Sequential programs that are hard to parallelize or ILP-limited can be speculatively parallelized in hardware
      • Thread-level speculation (TLS)
  • Today’s trend: if cannot do anything to boost single-thread performance invest transistors and resources to exploit TLP