Module 8: "Performance Issues"
  Lecture 15: "Locality and Communication Optimizations"
 

Artifactual comm.

  • Communication caused by artifacts of extended memory hierarchy
    • Data accesses not satisfied in the cache or local memory cause communication
    • Inherent communication is caused by data transfers determined by the program
    • Artifactual communication is caused by poor allocation of data across distributed memories, unnecessary data in a transfer, unnecessary transfers due to system-dependent transfer granularity, redundant communication of data, finite replication capacity (in cache or memory)
  • Inherent communication assumes infinite capacity and perfect knowledge of what should be transferred

Capacity problem

  • Most probable reason for artifactual communication
    • Due to finite capacity of cache, local memory or remote memory
    • May view a multiprocessor as a three-level memory hierarchy for this purpose: local cache, local memory, remote memory
    • Communication due to cold or compulsory misses and inherent communication are independent of capacity
    • Capacity and conflict misses generate communication resulting from finite capacity
    • Generated traffic may be local or remote depending on the allocation of pages
    • General technique: exploit spatial and temporal locality to use the cache properly

Temporal locality

  • Maximize reuse of data
    • Schedule tasks that access same data in close succession
    • Many linear algebra kernels use blocking of matrices to improve temporal (and spatial) locality
    • Example: Transpose phase in Fast Fourier Transform (FFT); to improve locality, the algorithm carries out blocked transpose i.e. transposes a block of data at a time

      Block transpose