Objectives_template

	Load Balancing Achievable speedup is bounded above by Sequential exec. time / Max. time for any processor Thus speedup is maximized when the maximum time and minimum time across all processors are close (want to minimize the variance of parallel execution time) This directly gets translated to load balancing What leads to a high variance? Ultimately all processors finish at the same time But some do useful work all over this period while others may spend a significant time at synchronization points This may arise from a bad partitioning There may be other architectural reasons for load imbalance beyond the scope of a programmer e.g., network congestion, unforeseen cache conflicts etc. (slows down a few threads) Dynamic Task Queues Introduced in the last lecture Normally implemented as part of the parallel program Two possible designs Centralized task queue: a single queue of tasks; may lead to heavy contention because insertion and deletion to/from the queue must be critical sections Distributed task queues: one queue per processor Issue with distributed task queues When a queue of a particular processor is empty what does it do? Task stealing