|
Static assignment
- Given a decomposition it is possible to assign tasks statically
- For example, some computation on an array of size N can be decomposed statically by assigning a range of indices to each process: for k processes P0 operates on indices 0 to (N/k)-1, P1 operates on N/k to (2N/k)-1,…, Pk-1 operates on (k-1)N/k to N-1
- For regular computations this works great: simple and low-overhead
- What if the nature computation depends on the index?
- For certain index ranges you do some heavy-weight computation while for others you do something simple
- Is there a problem?
Dynamic assignment
- Static assignment may lead to load imbalance depending on how irregular the application is
- Dynamic decomposition/assignment solves this issue by allowing a process to dynamically choose any available task whenever it is done with its previous task
- Normally in this case you decompose the program in such a way that the number of available tasks is larger than the number of processes
- Same example: divide the array into portions each with 10 indices; so you have N/10 tasks
- An idle process grabs the next available task
- Provides better load balance since longer tasks can execute concurrently with the smaller ones
- Dynamic assignment comes with its own overhead
- Now you need to maintain a shared count of the number of available tasks
- The update of this variable must be protected by a lock
- Need to be careful so that this lock contention does not outweigh the benefits of dynamic decomposition
- More complicated applications where a task may not just operate on an index range, but could manipulate a subtree or a complex data structure
- Normally a dynamic task queue is maintained where each task is probably a pointer to the data
- The task queue gets populated as new tasks are discovered
Decomposition types
- Decomposition by data
- The most commonly found decomposition technique
- The data set is partitioned into several subsets and each subset is assigned to a process
- The type of computation may or may not be identical on each subset
- Very easy to program and manage
- Computational decomposition
- Not so popular: tricky to program and manage
- All processes operate on the same data, but probably carry out different kinds of computation
- More common in systolic arrays, pipelined graphics processor units (GPUs) etc.
Orchestration
- Involves structuring communication and synchronization among processes, organizing data structures to improve locality, and scheduling tasks
- This step normally depends on the programming model and the underlying architecture
- Goal is to
- Reduce communication and synchronization costs
- Maximize locality of data reference
- Schedule tasks to maximize concurrency: do not schedule dependent tasks in parallel
- Reduce overhead of parallelization and concurrency management (e.g., management of the task queue, overhead of initiating a task etc.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|