Module 1: "Multi-core: The Ultimate Dose of Moore's Law"
  Lecture 2: "Moore's Law and Multi-cores"
 

Thread-level Parallelism:

  • Look for concurrency at a granularity coarser than instructions
    • Put a chunk of consecutive instructions together and call it a thread (largely wrong!)
    • Each thread can be seen as a “dynamic” subgraph of the sequential control-flow graph: take a loop and unroll its graph
    • The edges spanning the subgraphs represent data dependence across threads ( the spanning control edges are usually converted to data edges through suitable transformations)
      • The goal of parallelization is to minimize such edges
      • Threads should mostly compute independently on different cores; but need to talk once in a while to get things done!
  • Parallelizing sequential programs is fun, but often tedious for non-experts
    • So look for parallelism at even coarser grain
    • Run multiple independent programs simultaneously
      • Known as multi-programming
      • The biggest reason why quotidian Windows fans would buy small-scale multiprocessors and multi-core today
      • Can play games while running heavy-weight simulations and downloading movies
      • Have you seen the state of the poor machine when running anti-virus?

Communication in Multi-core:

  • Ideal for shared address space
    • Fast on-chip hardwired communication through cache (no OS intervention)
    • Two types of architectures
      • Tiled CMP: each core has its private cache hierarchy (no cache sharing); Intel Pentium D, Dual Core Opteron, Intel Montecito, Sun UltraSPARC IV, IBM Cell (more specialized)
      • Shared cache CMP: Outermost level of cache hierarchy is shared among cores; Intel Woodcrest (server-grade Core duo), Intel Conroe (Core2 duo for desktop), Sun Niagara, IBM Power4, IBM Power5