|
Functional units
- Right after an instruction is issued it reads the source operands (dictated by physical reg numbers) from the register file (integer or fp depending on instruction type)
- From stage 4 onwards the instructions execute
- Two ALUs: branch and shift can execute on ALU1, multiply/divide can execute on ALU2, all other instructions can execute on any of the two ALUs; ALU1 is responsible for triggering rollback in case of branch misprediction (marks all instructions after the branch as squashed, restores the register map from correct branch stack entry, sets fetch PC to the correct target)
- Four FPUs: one dedicated for fp multiply, one for fp divide, one for fp square root, most of the other instructions execute on the remaining FPU
- LSU (Load/store unit): Address calc. ALU, dTLB is fully assoc. with 64 entries and translates 44-bit VA to 40-bit PA, PA is used to match dcache tags (virtually indexed physically tagged)
Result writeback
- As soon as an instruction completes execution the result is written back to the destination physical register
- No need to wait till retirement since the renamer has guaranteed that this physical destination is associated with a unique instruction in the pipeline
- Also the results are launched on the bypass network (from outputs of ALU/FPU/dcache to inputs of ALU/FPU/address calculation ALUs)
- This guarantees that dependents can be issued back-to-back and still they can receive the correct value
- add r3, r4, r5; add r6, r4, r3; (can be issued in consecutive cycles, although the second add will read a wrong value of r3 from the register file)
Retirement or commit
- Immediately after the instructions finish execution, they may not be able to leave the pipe
- In-order retirement is necessary for precise exception
- When an instruction comes to the head of the active list it can retire
- R10k retires 4 instructions every cycle
- Retirement involves
- Updating the branch predictor and freeing its branch stack entry if it is a branch instruction
- Moving the store value from the speculative store buffer entry to the L1 data cache if it is a store instruction
- Freeing old destination physical register and updating the register free list
- And, finally, freeing the active list entry itself
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|