Module 13: INTRODUCTION TO COMPILERS FOR HIGH PERFORMANCE COMPUTERS
  Lecture 25: Supercomputing Applications
 


Spatial Locality

  • Instruction count can be decremented but is not the only issue
  • Performance is related to memory access
  • Use cache to improve performance
  • Matrices are stored row major
  • Fetch A[i,k] shows good spatial locality
  • Fetch B[k,j] is slow
  • Re-order loops

Improve Spatial Locality

for i := 1 to n do
for k := 1 to n do
for j := 1 to n do
c[i,j] = c[i,j] + a[i,k] ∗ b[k,j]
endfor
endfor
endfor


Temporal Locality

  • Previous program has no temporal locality for large matrices
  • Entire matrix B is fetched for each i
  • If row of B or C are large it does not benefit from temporal locality
  • Use sub-matrix multiplication

Improve Temporal Locality

for it := 1 to n by s do
for kt := 1 to n by s do
for jt := 1 to n by s do
for i = it to min(it+s-1, n) do
for k = kt to min(kt+s-1, n) do
for j = jt to min(jt+s-1, n) do
c[i,j] = c[i,j] + a[i,k] ∗ b[k,j]
endfor
endfor
endfor
endfor
endfor
endfor