Module 5: "MIPS R10000: A Case Study"
  Lecture 9: "MIPS R10000: A Case Study"
 

Branch prediction

  • Branches are predicted and unconditional jumps are computed in stage 2
    • There is always a one-cycle bubble (four instructions)
  • In case of branch misprediction (which will be detected later) the processor may need to roll back and restart fetching from the correct target
    • Need to checkpoint (i.e. save) the register map right after the branch is renamed (will be needed to restore in case of misprediction)
  • The processor supports at most four register map checkpoints; this is stored in a structure called branch stack (really, it is a FIFO queue, not a stack)
    • Can support up to four in-flight branches

Branch predictor

  • The predictor is an array of 512 two-bit saturating counters
    • Can count up to 3; if already 3, an increment does not have any effect (remains at 3)
    • Similarly, if the count is 0, a decrement does not have any effect (remains at 0)
  • The array is indexed by PC[11:3]
    • Ignore lower 3 bits, take the next 9 bits
    • The outcome is the count at that index of the predictor
  • If count >= 2 then predict taken; else not taken
  • Very simple algorithm; prediction accuracy of 85+% on most benchmarks; works fine for short pipes
  • Commonly known as bimodal branch predictor
  • The branch predictor is updated when a conditional branch retires (in-order update because retirement is in-order)
    • At retirement we know the correct outcome of the branch
    • So we use that to train the predictor
    • If the branch is taken the count in the index for that branch is incremented (remains at 3 if already 3)
    • If the branch is not taken the count is decremented (remains at zero if already 0)
  • This predictor will fail to predict many simple patterns including alternating branches depending on where the count starts

Register renaming

  • Takes place in the second pipeline stage
  • As we have discussed, every destination is assigned a new physical register from the free list
  • The sources are assigned the existing map
  • Map table is updated with the newly renamed dest.
  • For every destination physical register, a busy bit is set high to signify that the value in this register is not yet ready; this bit is cleared after the instruction completes execution
  • The integer and floating-point instructions are assigned registers from two separate free lists
    • The integer and fp register files are separate (each has 64 registers)