|
MLP
- Need memory-level parallelism (MLP)
- Simply speaking, need to mutually overlap several memory operations
- Step 1: Non-blocking cache
- Allow multiple outstanding cache misses
- Mutually overlap multiple cache misses
- Supported by all microprocessors today (Alpha 21364 supported 16 outstanding cache misses)
- Step 2: Out-of-order load issue
- Issue loads out of program order (address is not known at the time of issue)
- How do you know the load didn’t issue before a store to the same address? Issuing stores must check for this memory-order violation
Out-of-order loads
sw 0(r7), r6
… /* other instructions */
lw r2, 80(r20)
- Assume that the load issues before the store because r20 gets ready before r6 or r7
- The load accesses the store buffer (used for holding already executed store values before they are committed to the cache at retirement)
- If it misses in the store buffer it looks up the caches and, say, gets the value somewhere
- After several cycles the store issues and it turns out that 0(r7)==80(r20) or they overlap; now what?
Load/store ordering
- Out-of-order load issue relies on speculative memory disambiguation
- Assumes that there will be no conflicting store
- If the speculation is correct, you have issued the load much earlier and you have allowed the dependents to also execute much earlier
- If there is a conflicting store, you have to squash the load and all the dependents that have consumed the load value and re-execute them systematically
- Turns out that the speculation is correct most of the time
- To further minimize the load squash, microprocessors use simple memory dependence predictors (predicts if a load is going to conflict with a pending store based on that load’s or load/store pairs’ past behavior)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|