Module 8: "Performance Issues"
  Lecture 15: "Locality and Communication Optimizations"
 

Hot-spots

  • Avoid location hot-spot by either staggering accesses to the same location or by designing the algorithm to exploit a tree structured communication
  • Module hot-spot
    • Normally happens when a particular node saturates handling too many messages (need not be to same memory location) within a short amount of time
    • Normal solution again is to design the algorithm in such a way that these messages are staggered over time
  • Rule of thumb: design communication pattern such that it is not bursty; want to distribute it uniformly over time

Overlap

  • Increase overlap between communication and computation
    • Not much to do at algorithm level unless the programming model and/or OS provide some primitives to carry out prefetching, block data transfer, non-blocking receive etc.
    • Normally, these techniques increase bandwidth demand because you end up communicating the same amount of data, but in a shorter amount of time (execution time hopefully goes down if you can exploit overlap)

Summary

  • Comparison of sequential and parallel execution
    • Sequential execution time = busy useful time + local data access time
    • Parallel execution time  = busy useful time + busy overhead (extra work) + local data access time + remote data access time + synchronization time
    • Busy useful time in parallel execution is ideally equal to  sequential busy useful time / number of processors
    • Local data access time in parallel execution is also less compared to that in sequential execution because ideally each processor accesses less than 1/P th of the local data (some data now become remote)
  • Parallel programs introduce three overhead terms: busy overhead (extra work), remote data access time, and synchronization time
    • Goal of a good parallel program is to minimize these three terms
    • Goal of a good parallel computer architecture is to provide sufficient support to let programmers optimize these three terms (and this is the focus of the rest of the course)