|
Overview
- Mid 90s: One of the first dynamic out-of-order superscalar RISC microprocessors
- 6.8 M transistors on 298 mm2 die (0.35 μm CMOS)
- Out of 6.8 M transistors 4.4 M are devoted to L1 instruction and data caches
- Fetches, decodes, renames 4 instructions every cycle
- 64-bit registers: the data path width is 64 bits
- On-chip 32 KB L1 instruction and data caches, 2-way set associative
- Off-chip L2 cache of variable size (512 KB to 16 MB), 2-way set associative, line size 128 bytes
Stage 1: Fetch
- The instructions are slightly pre-decoded when the cache line is brought into Icache
- Simplifies the decode stage
- Processor fetches four sequential instructions every cycle from the Icache
- The iTLB has eight entries, fully associative
- No BTB
- So the fetcher really cannot do anything about branches other than fetching sequentially
Stage 2: Decode/Rename
- Decodes and renames four instructions every cycle
- The targets of branches, unconditional jumps, and subroutine calls (named jump and link or jal) are computed in this stage
- Unconditional jumps are not fed into the pipeline and the fetcher PC is modified directly by the decoder
- Conditional branches look up a simple predictor to predict the branch direction (taken or not taken) and accordingly modify the fetch PC
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|