|
Spatial Locality
- Instruction count can be decremented but is not the only issue
- Performance is related to memory access
- Use cache to improve performance
- Matrices are stored row major
- Fetch A[i,k] shows good spatial locality
- Fetch B[k,j] is slow
- Re-order loops
Improve Spatial Locality
for i := 1 to n do
for k := 1 to n do
for j := 1 to n do
c[i,j] = c[i,j] + a[i,k] ∗ b[k,j]
endfor
endfor
endfor |
|
Temporal Locality
- Previous program has no temporal locality for large matrices
- Entire matrix B is fetched for each i
- If row of B or C are large it does not benefit from temporal locality
- Use sub-matrix multiplication
Improve Temporal Locality
for it := 1 to n by s do
for kt := 1 to n by s do
for jt := 1 to n by s do
for i = it to min(it+s-1, n) do
for k = kt to min(kt+s-1, n) do
for j = jt to min(jt+s-1, n) do
c[i,j] = c[i,j] + a[i,k] ∗ b[k,j]
endfor
endfor
endfor
endfor
endfor
endfor |
|