Module 3: "Recap: Single-threaded Execution"
  Lecture 5: "Pipelining and Hazards"
 

 

Control hazard

  • Branches pose a problem
  • Two pipeline bubbles: increases average CPI
  • Can we reduce it to one bubble?

Branch delay slot

  • MIPS R3000 has one bubble
    • Called branch delay slot
    • Exploit clock cycle phases
    • On the positive half compute branch condition
    • On the negative half fetch the target
  • The PC update hardware (selection between target and next PC) works on the lower edge
  • Can we utilize the branch delay slot?
    • Ask the compiler guy
    • The delay slot is always executed (irrespective of the fate of the branch)
    • Boost instructions common to fall through and target paths to the delay slot
    • Not always possible to find
    • You have to be careful also
    • Must boost something that does not alter the outcome of fall-through or target basic blocks
    • If the BD slot is filled with useful instruction then we don’t lose anything in CPI; otherwise we pay a branch penalty of one cycle

What else can we do?

  • Branch prediction
    • We can put a branch target cache in the fetcher
    • Also called branch target buffer (BTB)
    • Use the lower bits of the instruction PC to index the BTB
    • Use the remaining bits to match the tag
    • In case of a hit the BTB tells you the target of the branch when it executed last time
    • You can hope that this is correct and start fetching from that predicted target provided by the BTB
    • One cycle later you get the real target, compare with the predicted target, and throw away the fetched instruction in case of misprediction; keep going if predicted correctly

Branch prediction

  • BTB will work great for
    • Loop branches
    • Subroutine calls
    • Unconditional branches
  • Conditional branch prediction
    • Rather dynamic in nature
    • The last target is not very helpful in general (if-then-else)
    • Need a direction predictor (predicts taken or not taken)
    • Once that prediction is available we can compute the target
  • Return address stack (RAS): push/pop interface