|
Control hazard
- Branches pose a problem
- Two pipeline bubbles: increases average CPI
- Can we reduce it to one bubble?
Branch delay slot
- MIPS R3000 has one bubble
- Called branch delay slot
- Exploit clock cycle phases
- On the positive half compute branch condition
- On the negative half fetch the target
- The PC update hardware (selection between target and next PC) works on the lower edge
- Can we utilize the branch delay slot?
- Ask the compiler guy
- The delay slot is always executed (irrespective of the fate of the branch)
- Boost instructions common to fall through and target paths to the delay slot
- Not always possible to find
- You have to be careful also
- Must boost something that does not alter the outcome of fall-through or target basic blocks
- If the BD slot is filled with useful instruction then we don’t lose anything in CPI; otherwise we pay a branch penalty of one cycle
What else can we do?
- Branch prediction
- We can put a branch target cache in the fetcher
- Also called branch target buffer (BTB)
- Use the lower bits of the instruction PC to index the BTB
- Use the remaining bits to match the tag
- In case of a hit the BTB tells you the target of the branch when it executed last time
- You can hope that this is correct and start fetching from that predicted target provided by the BTB
- One cycle later you get the real target, compare with the predicted target, and throw away the fetched instruction in case of misprediction; keep going if predicted correctly
Branch prediction
- BTB will work great for
- Loop branches
- Subroutine calls
- Unconditional branches
- Conditional branch prediction
- Rather dynamic in nature
- The last target is not very helpful in general (if-then-else)
- Need a direction predictor (predicts taken or not taken)
- Once that prediction is available we can compute the target
- Return address stack (RAS): push/pop interface
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|