Module 8: "Performance Issues"
  Lecture 14: "Load Balancing and Domain Decomposition"
 

Dynamic task queues

  • Introduced in the last lecture
  • Normally implemented as part of the parallel program
  • Two possible designs
    • Centralized task queue: a single queue of tasks; may lead to heavy contention because insertion and deletion to/from the queue must be critical sections
    • Distributed task queues: one queue per processor
  • Issue with distributed task queues
    • When a queue of a particular processor is empty what does it do? Task stealing

Task stealing

  • A processor may choose to steal tasks from another processor’s queue if the former’s queue is empty
    • How many tasks to steal? Whom to steal from?
    • The biggest question: how to detect termination? Really a distributed consensus!
    • Task stealing, in general, may increase overhead and communication, but a smart design may lead to excellent load balance (normally hard to design efficiently)
    • This is a form of a more general technique called Receiver Initiated Diffusion (RID) where the receiver of the task initiates the task transfer
    • In Sender Initiated Diffusion (SID) a processor may choose to insert into another processor’s queue if the former’s task queue is full above a threshold

Architect’s job

  • Normally load balancing is a responsibility of the programmer
    • However, an architecture may provide efficient primitives to implement task queues and task stealing
    • For example, the task queue may be allocated in a special shared memory segment, accesses to which may be optimized by special hardware in the memory controller
    • But this may expose some of the architectural features to the programmer
    • There are multiprocessors that provide efficient implementations for certain synchronization primitives; this may improve load balance
    • Sophisticated hardware tricks are possible: dynamic load monitoring and favoring slow threads dynamicall

Partitioning and communication

  • Need to reduce inherent communication
    • This is the part of communication determined by assignment of tasks
    • There may be other communication traffic also (more later)
  • Goal is to assign tasks such that accessed data are mostly local to a process
    • Ideally I do not want any communication
    • But in life sometimes you need to talk to people to get some work done!