Module 18: Loop Optimizations
  Lecture 36: Cycle Shrinking
 


Loop Fusion

  • When two adjacent countable loops have the same loop limits they can sometimes be fused
  • Reduces cost of test and branch
  • Fusing loops which refer to the same data enhances temporal locality
    • It has significant impact on cache and virtual memory performance
  • Loop fusion may increase size of the loop which can reduce instruction locality (noticeable with very small cache memories)
  • Fusion is legal if all the dependence relations are preserved
  • Before fusion all relations must flow from body1 to body2 (unless carried by an outer loop)

For I = 1,n
A[i]=B[i]+1
Endfor
For I = 1,n
C[i]=A[i]/2
Endfor
For I = 1,n
D[i]=1/C[i+1]
Endfor
S2S5
S5S8

For I = 1,n
A[i]=B[i]+1
C[i]=A[i]/2
D[i]=1/C[i+1]
Endfor

after fusion
the second
dependence
is violated

For I = 1,n
A[i]=B[i]+1
C[i]=A[i]/2
Endfor
For I = 1,n
D[i]=1/C[i+1]
Endfor
     
for I = 1,99
A[i]=B[i]+1
Endfor
for I = 1,98
C[i]=A[i+1]*2
Endfor
A[1]=B[1]+1
for I = 2,99
A[i]=B[i]+1
Endfor
for I = 1,98
C[i]=A[i+1]* 2
Endfor
A[1]=B[1]+1
for j = 0,97
A[j+2]=B[j+2]+1
C[j+1]=A[j+2]*2
Endfor


Loop Fission

  • A single loop may be broken into smaller loops (inverse of loop fusion)
  • Used on machines which have very small instruction cache
  • Improves memory locality
  • Construct a statement level dependence graph of the body of the loop
    • Dependence relations carried by outer loop need not be preserved
    • Inner loops are treated as single nodes
    • If there are no cycles then loop fission can divide the loop into separate loops around each node
    • The loops are ordered in topological order of the dependence graph