Objectives_template

	Static assignment Given a decomposition it is possible to assign tasks statically For example, some computation on an array of size N can be decomposed statically by assigning a range of indices to each process: for k processes P₀ operates on indices 0 to (N/k)-1, P₁ operates on N/k to (2N/k)-1,…, P_k-1 operates on (k-1)N/k to N-1 For regular computations this works great: simple and low-overhead What if the nature computation depends on the index? For certain index ranges you do some heavy-weight computation while for others you do something simple Is there a problem? Dynamic assignment Static assignment may lead to load imbalance depending on how irregular the application is Dynamic decomposition/assignment solves this issue by allowing a process to dynamically choose any available task whenever it is done with its previous task Normally in this case you decompose the program in such a way that the number of available tasks is larger than the number of processes Same example: divide the array into portions each with 10 indices; so you have N/10 tasks An idle process grabs the next available task Provides better load balance since longer tasks can execute concurrently with the smaller ones Dynamic assignment comes with its own overhead Now you need to maintain a shared count of the number of available tasks The update of this variable must be protected by a lock Need to be careful so that this lock contention does not outweigh the benefits of dynamic decomposition More complicated applications where a task may not just operate on an index range, but could manipulate a subtree or a complex data structure Normally a dynamic task queue is maintained where each task is probably a pointer to the data The task queue gets populated as new tasks are discovered Decomposition types Decomposition by data The most commonly found decomposition technique The data set is partitioned into several subsets and each subset is assigned to a process The type of computation may or may not be identical on each subset Very easy to program and manage Computational decomposition Not so popular: tricky to program and manage All processes operate on the same data, but probably carry out different kinds of computation More common in systolic arrays, pipelined graphics processor units (GPUs) etc. Orchestration Involves structuring communication and synchronization among processes, organizing data structures to improve locality, and scheduling tasks This step normally depends on the programming model and the underlying architecture Goal is to Reduce communication and synchronization costs Maximize locality of data reference Schedule tasks to maximize concurrency: do not schedule dependent tasks in parallel Reduce overhead of parallelization and concurrency management (e.g., management of the task queue, overhead of initiating a task etc.)