Objectives_template

	MLP Need memory-level parallelism (MLP) Simply speaking, need to mutually overlap several memory operations Step 1: Non-blocking cache Allow multiple outstanding cache misses Mutually overlap multiple cache misses Supported by all microprocessors today (Alpha 21364 supported 16 outstanding cache misses) Step 2: Out-of-order load issue Issue loads out of program order (address is not known at the time of issue) How do you know the load didn't issue before a store to the same address? Issuing stores must check for this memory-order violation Out-of-order Loads sw 0(r7), r6 … /* other instructions */ lw r2, 80(r20) Assume that the load issues before the store because r20 gets ready before r6 or r7 The load accesses the store buffer (used for holding already executed store values before they are committed to the cache at retirement) If it misses in the store buffer it looks up the caches and, say, gets the value somewhere After several cycles the store issues and it turns out that 0(r7)==80(r20) or they overlap; now what?