Module 3: Parallel Programming: Shared Memory and Message Passing
  Lecture 8: Optimizing Shared Memory Performance
 


Task Stealing

  • A processor may choose to steal tasks from another processor's queue if the former's queue is empty
    • How many tasks to steal? Whom to steal from?
    • The biggest question: how to detect termination? Really a distributed consensus!
    • Task stealing, in general, may increase overhead and communication, but a smart design may lead to excellent load balance (normally hard to design efficiently)
    • This is a form of a more general technique called Receiver Initiated Diffusion (RID) where the receiver of the task initiates the task transfer
    • In Sender Initiated Diffusion (SID) a processor may choose to insert into another processor's queue if the former's task queue is full above a threshold

Architect's Job

  • Normally load balancing is a responsibility of the programmer
    • However, an architecture may provide efficient primitives to implement task queues and task stealing
    • For example, the task queue may be allocated in a special shared memory segment, accesses to which may be optimized by special hardware in the memory controller
    • But this may expose some of the architectural features to the programmer
    • There are multiprocessors that provide efficient implementations for certain synchronization primitives; this may improve load balance
    • Sophisticated hardware tricks are possible: dynamic load monitoring and favoring slow threads dynamically