Module 7: "Parallel Programming"
  Lecture 13: "Parallelizing a Sequential Program"
 

Decomposition of Iterative Equation Solver

  • Look for concurrency in loop iterations
    • In this case iterations are really dependent
    • Iteration (i, j) depends on iterations (i, j-1) and (i-1, j)
    • Each anti-diagonal can be computed in parallel
    • Must synchronize after each anti-diagonal (or pt-to-pt)
    • Alternative: red-black ordering (different update pattern)
  • Can update all red points first, synchronize globally with a barrier and then update all black points
    • May converge faster or slower compared to sequential program
    • Converged equilibrium may also be different if there are multiple solutions
    • Ocean simulation uses this decomposition
  • We will ignore the loop-carried dependence and go ahead with a straight-forward loop decomposition
    • Allow updates to all points in parallel
    • This is yet another different update order and may affect convergence
    • Update to a point may or may not see the new updates to the nearest neighbors (this parallel algorithm is non-deterministic)
    • while (!done)
         diff = 0.0;
         for_all i = 0 to n-1
            for_all j = 0 to n-1
               temp = A[i, j];
               A[i, j] = 0.2(A[i, j]+A[i, j+1]+A[i, j-1]+A[i-1, j]+A[i+1, j]);
               diff += fabs (A[i, j] – temp);
            end for_all
         end for_all
         if (diff/(n*n) < TOL) then done = 1;
      end while

  • Offers concurrency across elements: degree of concurrency is n2
  • Make the j loop sequential to have row-wise decomposition: degree n concurrency