Module 5: Performance Issues in Shared Memory and Introduction to Coherence
  Lecture 9: Performance Issues in Shared Memory
 


Temporal Locality

  • Maximize reuse of data
    • Schedule tasks that access same data in close succession
    • Many linear algebra kernels use blocking of matrices to improve temporal (and spatial) locality
    • Example: Transpose phase in Fast Fourier Transform (FFT); to improve locality, the algorithm carries out blocked transpose i.e. transposes a block of data at a time

Spatial Locality

  • Consider a square block decomposition of grid solver and a C-like row major layout i.e. A[ i ][j] and A[ i ][j+1] have contiguous memory locations