Module 3: Parallel Programming: Shared Memory and Message Passing
Lecture 8: Optimizing Shared Memory Performance
Task Stealing
A processor may choose to steal tasks from another processor's queue if the former's queue is empty
How many tasks to steal? Whom to steal from?
The biggest question: how to detect termination? Really a distributed consensus!
Task stealing, in general, may increase overhead and communication, but a smart design may lead to excellent load balance (normally hard to design efficiently)
This is a form of a more general technique called Receiver Initiated Diffusion (RID) where the receiver of the task initiates the task transfer
In Sender Initiated Diffusion (SID) a processor may choose to insert into another processor's queue if the former's task queue is full above a threshold
Architect's Job
Normally load balancing is a responsibility of the programmer
However, an architecture may provide efficient primitives to implement task queues and task stealing
For example, the task queue may be allocated in a special shared memory segment, accesses to which may be optimized by special hardware in the memory controller
But this may expose some of the architectural features to the programmer
There are multiprocessors that provide efficient implementations for certain synchronization primitives; this may improve load balance
Sophisticated hardware tricks are possible: dynamic load monitoring and favoring slow threads dynamically