Module 8: "Performance Issues"
  Lecture 14: "Load Balancing and Domain Decomposition"
 

Domain decomposition

  • Normally applications show a local bias on data usage
    • Communication is short-range e.g. nearest neighbor
    • Even if it is long-range it falls off with distance
    • View the dataset of an application as the domain of the problem e.g., the 2-D grid in equation solver
    • If you consider a point in this domain, in most of the applications it turns out that this point depends on points that are close by
    • Partitioning can exploit this property by assigning contiguous pieces of data to each process
    • Exact shape of decomposed domain depends on the application and load balancing requirements

Comm-to-comp ratio

  • Surely, there could be many different domain decompositions for a particular problem
    • For grid solver we may have a square block decomposition, block row decomposition or cyclic row decomposition
    • How to determine which one is good? Communication-to-computation ratio

Assume P processors and NxN grid for grid solver

Size of each block: N/√P by N/√P

Communication (perimeter): 4N/√P
Computation (area): N2/P
Comm-to-comp ratio = 4√P/N

Sq. block decomp. for P=16

  • For block row decomposition
    • Each strip has N/P rows
    • Communication (boundary rows): 2N
    • Computation (area): N2/P (same as square block)
    • Comm-to-comp ratio: 2P/N
  • For cyclic row decomposition
    • Each processor gets N/P isolated rows
    • Communication: 2N2/P
    • Computation: N2/P
    • Comm-to-comp ratio: 2
  • Normally N is much much larger than P
    • Asymptotically, square block yields lowest comm-to-comp ratio
  • Idea is to measure the volume of inherent communication per computation
    • In most cases it is beneficial to pick the decomposition with the lowest comm-to-comp ratio
    • But depends on the application structure i.e. picking the lowest comm-to-comp may have other problems
    • Normally this ratio gives you a rough estimate about average communication bandwidth requirement of the application i.e. how frequent is communication
    • But it does not tell you the nature of communication i.e. bursty or uniform
    • For grid solver comm. happens only at the start of each iteration; it is not uniformly distributed over computation
    • Thus the worst case BW requirement may exceed the average comm-to-comp ratio