|
Branch prediction
- Branches are predicted and unconditional jumps are computed in stage 2
- There is always a one-cycle bubble (four instructions)
- In case of branch misprediction (which will be detected later) the processor may need to roll back and restart fetching from the correct target
- Need to checkpoint (i.e. save) the register map right after the branch is renamed (will be needed to restore in case of misprediction)
- The processor supports at most four register map checkpoints; this is stored in a structure called branch stack (really, it is a FIFO queue, not a stack)
- Can support up to four in-flight branches
Branch predictor
- The predictor is an array of 512 two-bit saturating counters
- Can count up to 3; if already 3, an increment does not have any effect (remains at 3)
- Similarly, if the count is 0, a decrement does not have any effect (remains at 0)
- The array is indexed by PC[11:3]
- Ignore lower 3 bits, take the next 9 bits
- The outcome is the count at that index of the predictor
- If count >= 2 then predict taken; else not taken
- Very simple algorithm; prediction accuracy of 85+% on most benchmarks; works fine for short pipes
- Commonly known as bimodal branch predictor
- The branch predictor is updated when a conditional branch retires (in-order update because retirement is in-order)
- At retirement we know the correct outcome of the branch
- So we use that to train the predictor
- If the branch is taken the count in the index for that branch is incremented (remains at 3 if already 3)
- If the branch is not taken the count is decremented (remains at zero if already 0)
- This predictor will fail to predict many simple patterns including alternating branches depending on where the count starts
Register renaming
- Takes place in the second pipeline stage
- As we have discussed, every destination is assigned a new physical register from the free list
- The sources are assigned the existing map
- Map table is updated with the newly renamed dest.
- For every destination physical register, a busy bit is set high to signify that the value in this register is not yet ready; this bit is cleared after the instruction completes execution
- The integer and floating-point instructions are assigned registers from two separate free lists
- The integer and fp register files are separate (each has 64 registers)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|