|
Thread-level Parallelism:
- Look for concurrency at a granularity coarser than instructions
- Put a chunk of consecutive instructions together and call it a thread (largely wrong!)
- Each thread can be seen as a “dynamic” subgraph of the sequential control-flow graph: take a loop and unroll its graph
- The edges spanning the subgraphs represent data dependence across threads ( the spanning control edges are usually converted to data edges through suitable transformations)
- The goal of parallelization is to minimize such edges
- Threads should mostly compute independently on different cores; but need to talk once in a while to get things done!
- Parallelizing sequential programs is fun, but often tedious for non-experts
- So look for parallelism at even coarser grain
- Run multiple independent programs simultaneously
- Known as multi-programming
- The biggest reason why quotidian Windows fans would buy small-scale multiprocessors and multi-core today
- Can play games while running heavy-weight simulations and downloading movies
- Have you seen the state of the poor machine when running anti-virus?
Communication in Multi-core:
- Ideal for shared address space
- Fast on-chip hardwired communication through cache (no OS intervention)
- Two types of architectures
- Tiled CMP: each core has its private cache hierarchy (no cache sharing); Intel Pentium D, Dual Core Opteron, Intel Montecito, Sun UltraSPARC IV, IBM Cell (more specialized)
- Shared cache CMP: Outermost level of cache hierarchy is shared among cores; Intel Woodcrest (server-grade Core duo), Intel Conroe (Core2 duo for desktop), Sun Niagara, IBM Power4, IBM Power5
|