Objectives_template

	Thread-level Parallelism: Look for concurrency at a granularity coarser than instructions Put a chunk of consecutive instructions together and call it a thread (largely wrong!) Each thread can be seen as a “dynamic” subgraph of the sequential control-flow graph: take a loop and unroll its graph The edges spanning the subgraphs represent data dependence across threads ( the spanning control edges are usually converted to data edges through suitable transformations) The goal of parallelization is to minimize such edges Threads should mostly compute independently on different cores; but need to talk once in a while to get things done! Parallelizing sequential programs is fun, but often tedious for non-experts So look for parallelism at even coarser grain Run multiple independent programs simultaneously Known as multi-programming The biggest reason why quotidian Windows fans would buy small-scale multiprocessors and multi-core today Can play games while running heavy-weight simulations and downloading movies Have you seen the state of the poor machine when running anti-virus? Communication in Multi-core: Ideal for shared address space Fast on-chip hardwired communication through cache (no OS intervention) Two types of architectures Tiled CMP: each core has its private cache hierarchy (no cache sharing); Intel Pentium D, Dual Core Opteron, Intel Montecito, Sun UltraSPARC IV, IBM Cell (more specialized) Shared cache CMP: Outermost level of cache hierarchy is shared among cores; Intel Woodcrest (server-grade Core duo), Intel Conroe (Core2 duo for desktop), Sun Niagara, IBM Power4, IBM Power5