|
Load Balancing
- Achievable speedup is bounded above by
- Sequential exec. time / Max. time for any processor
- Thus speedup is maximized when the maximum time and minimum time across all processors are close (want to minimize the variance of parallel execution time)
- This directly gets translated to load balancing
- What leads to a high variance?
- Ultimately all processors finish at the same time
- But some do useful work all over this period while others may spend a significant time at synchronization points
- This may arise from a bad partitioning
- There may be other architectural reasons for load imbalance beyond the scope of a programmer e.g., network congestion, unforeseen cache conflicts etc. (slows down a few threads)
Dynamic Task Queues
- Introduced in the last lecture
- Normally implemented as part of the parallel program
- Two possible designs
- Centralized task queue: a single queue of tasks; may lead to heavy contention because insertion and deletion to/from the queue must be critical sections
- Distributed task queues: one queue per processor
- Issue with distributed task queues
- When a queue of a particular processor is empty what does it do? Task stealing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|