Objectives_template

	Task Stealing A processor may choose to steal tasks from another processor's queue if the former's queue is empty How many tasks to steal? Whom to steal from? The biggest question: how to detect termination? Really a distributed consensus! Task stealing, in general, may increase overhead and communication, but a smart design may lead to excellent load balance (normally hard to design efficiently) This is a form of a more general technique called Receiver Initiated Diffusion (RID) where the receiver of the task initiates the task transfer In Sender Initiated Diffusion (SID) a processor may choose to insert into another processor's queue if the former's task queue is full above a threshold Architect's Job Normally load balancing is a responsibility of the programmer However, an architecture may provide efficient primitives to implement task queues and task stealing For example, the task queue may be allocated in a special shared memory segment, accesses to which may be optimized by special hardware in the memory controller But this may expose some of the architectural features to the programmer There are multiprocessors that provide efficient implementations for certain synchronization primitives; this may improve load balance Sophisticated hardware tricks are possible: dynamic load monitoring and favoring slow threads dynamically