MLP
- Need memory-level parallelism (MLP)
- Simply speaking, need to mutually overlap several memory operations
- Step 1: Non-blocking cache
- Allow multiple outstanding cache misses
- Mutually overlap multiple cache misses
- Supported by all microprocessors today (Alpha 21364 supported 16 outstanding cache misses)
- Step 2: Out-of-order load issue
- Issue loads out of program order (address is not known at the time of issue)
- How do you know the load didn't issue before a store to the same address? Issuing stores must check for this memory-order violation
Out-of-order Loads
sw 0(r7), r6
… /* other instructions */
lw r2, 80(r20)
- Assume that the load issues before the store because r20 gets ready before r6 or r7
- The load accesses the store buffer (used for holding already executed store values before they are committed to the cache at retirement)
- If it misses in the store buffer it looks up the caches and, say, gets the value somewhere
- After several cycles the store issues and it turns out that 0(r7)==80(r20) or they overlap; now what?
|