Module 3: Fundamentals of Parallel Computers: ILP vs TLP
  Lecture 6: Preliminaries of Parallel Programming
 


Decomposition

while (!done)
diff = 0.0;
for_all i = 0 to n-1
for_all j = 0 to n-1
temp = A[ i , j];
A[ i , j] = 0.2(A[ i , j]+A[ i , j+1]+A[ i , j-1]+A[i-1, j]+A[i+1, j]; )
diff += fabs (A[ i , j] – temp);
end for_all
end for_all
if (diff/(n*n) < TOL) then done = 1;
end while

  • Offers concurrency across elements: degree of concurrency is n 2
  • Make the j loop sequential to have row-wise decomposition: degree n concurrency

Assignment

  • Possible static assignment: block row decomposition
    • Process 0 gets rows 0 to (n/p)-1, process 1 gets rows n/p to (2n/p)-1 etc.
  • Another static assignment: cyclic row decomposition
    • Process 0 gets rows 0, p, 2p,…; process 1 gets rows 1, p+1, 2p+1,….
  • Dynamic assignment
    • Grab next available row, work on that, grab a new row,…
  • Static block row assignment minimizes nearest neighbor communication by assigning contiguous rows to the same process