|
Agenda
- Partitioning for performance
- Data access and communication
- Summary
- Goal is to understand simple trade-offs involved in writing a parallel program keeping an eye on parallel performance
- Getting good performance out of a multiprocessor is difficult
- Programmers need to be careful
- A little carelessness may lead to extremely poor performanc
Partitioning for perf.
- Partitioning plays an important role in the parallel performance
- This is where you essentially determine the tasks
- A good partitioning should practise
- Load balance
- Minimal communication
- Low overhead to determine and manage task assignment (sometimes called extra work)
- A well-balanced parallel program automatically has low barrier or point-to-point synchronization time
- Ideally I want all the threads to arrive at a barrier at the same time
Load balancing
- Achievable speedup is bounded above by
- Sequential exec. time / Max. time for any processor
- Thus speedup is maximized when the maximum time and minimum time across all processors are close (want to minimize the variance of parallel execution time)
- This directly gets translated to load balancing
- What leads to a high variance?
- Ultimately all processors finish at the same time
- But some do useful work all over this period while others may spend a significant time at synchronization points
- This may arise from a bad partitioning
- There may be other architectural reasons for load imbalance beyond the scope of a programmer e.g., network congestion, unforeseen cache conflicts etc. (slows down a few threads)
- Effect of decomposition/assignment on load balancing
- Static partitioning is good when the nature of computation is predictable and regular
- Dynamic partitioning normally provides better load balance, but has more runtime overhead for task management; also it may increase communication
- Fine grain partitioning (extreme is one instruction per thread) leads to more overhead, but better load balance
- Coarse grain partitioning (e.g., large tasks) may lead to load imbalance if the tasks are not well-balanced
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|