Objectives_template

	Technology trends The natural building block for multiprocessors is microprocessor Microprocessor performance increases 50% every year Transistor count doubles every 18 months Intel Pentium 4 EE 3.4 GHz has 178 M transistors on a 237 mm2 die 130 nm Itanium 2 has 410 M transistors on a 374 mm2 die 90 nm Intel Montecito has 1.7 B transistors on a 596 mm2 die Die area is also growing Intel Prescott had 125 M transistors on a 112 mm2 die Ever-shrinking process technology Shorter gate length of transistors Can afford to sweep electrons through channel faster Transistors can be clocked at faster rate Transistors also get smaller Can afford to pack more on the die And die size is also increasing What to do with so many transistors? Could increase L2 or L3 cache size Does not help much beyond a certain point Burns more power Could improve microarchitecture Better branch predictor or novel designs to improve instruction-level parallelism (ILP) If cannot improve single-thread performance have to look for thread-level parallelism (TLP) Multiple cores on the die (chip multiprocessors): IBM POWER4, POWER5, Intel Montecito, Intel Pentium 4, AMD Opteron, Sun UltraSPARC IV TLP on chip Instead of putting multiple cores could put extra resources and logic to run multiple threads simultaneously (simultaneous multi-threading): Alpha 21464 (cancelled), Intel Pentium 4, IBM POWER5, Intel Montecito Today’s microprocessors are small-scale multiprocessors (dual-core, 2-way SMT) Tomorrow’s microprocessors will be larger-scale multiprocessors or highly multi-threaded Sun Niagara is an 8-core (each 4-way threaded) chip: 32 threads on a single chip Architectural trends Circuits: bit-level parallelism Started with 4 bits (Intel 4004) [http://www.intel4004.com/] Now 32-bit processor is the norm 64-bit processors are taking over (AMD Opteron, Intel Itanium, Pentium 4 family); started with Alpha, MIPS, Sun families Architecture: instruction-level parallelism (ILP) Extract independent instruction stream Key to advanced microprocessor design Gradually hitting a limit: memory wall Memory operations are bottleneck Need memory-level parallelism (MLP) Also technology limits such as wire delay are pushing for a more distributed control rather than the centralized control in today’s processors If cannot boost ILP what can be done? Thread-level parallelism (TLP) Explicit parallel programs already have TLP (inherent) Sequential programs that are hard to parallelize or ILP-limited can be speculatively parallelized in hardware Thread-level speculation (TLS) Today’s trend: if cannot do anything to boost single-thread performance invest transistors and resources to exploit TLP