Objectives_template

	MLP Need memory-level parallelism (MLP) Simply speaking, need to mutually overlap several memory operations Step 1: Non-blocking cache Allow multiple outstanding cache misses Mutually overlap multiple cache misses Supported by all microprocessors today (Alpha 21364 supported 16 outstanding cache misses) Step 2: Out-of-order load issue Issue loads out of program order (address is not known at the time of issue) How do you know the load didn’t issue before a store to the same address? Issuing stores must check for this memory-order violation Out-of-order loads sw 0(r7), r6 … /* other instructions */ lw r2, 80(r20) Assume that the load issues before the store because r20 gets ready before r6 or r7 The load accesses the store buffer (used for holding already executed store values before they are committed to the cache at retirement) If it misses in the store buffer it looks up the caches and, say, gets the value somewhere After several cycles the store issues and it turns out that 0(r7)==80(r20) or they overlap; now what? Load/store ordering Out-of-order load issue relies on speculative memory disambiguation Assumes that there will be no conflicting store If the speculation is correct, you have issued the load much earlier and you have allowed the dependents to also execute much earlier If there is a conflicting store, you have to squash the load and all the dependents that have consumed the load value and re-execute them systematically Turns out that the speculation is correct most of the time To further minimize the load squash, microprocessors use simple memory dependence predictors (predicts if a load is going to conflict with a pending store based on that load’s or load/store pairs’ past behavior)