Objectives_template

Module 5: "MIPS R10000: A Case Study"

Lecture 9: "MIPS R10000: A Case Study"

	Overview Mid 90s: One of the first dynamic out-of-order superscalar RISC microprocessors 6.8 M transistors on 298 mm2 die (0.35 μm CMOS) Out of 6.8 M transistors 4.4 M are devoted to L1 instruction and data caches Fetches, decodes, renames 4 instructions every cycle 64-bit registers: the data path width is 64 bits On-chip 32 KB L1 instruction and data caches, 2-way set associative Off-chip L2 cache of variable size (512 KB to 16 MB), 2-way set associative, line size 128 bytes Stage 1: Fetch The instructions are slightly pre-decoded when the cache line is brought into Icache Simplifies the decode stage Processor fetches four sequential instructions every cycle from the Icache The iTLB has eight entries, fully associative No BTB So the fetcher really cannot do anything about branches other than fetching sequentially Stage 2: Decode/Rename Decodes and renames four instructions every cycle The targets of branches, unconditional jumps, and subroutine calls (named jump and link or jal) are computed in this stage Unconditional jumps are not fed into the pipeline and the fetcher PC is modified directly by the decoder Conditional branches look up a simple predictor to predict the branch direction (taken or not taken) and accordingly modify the fetch PC