|
Load/store Ordering
- Out-of-order load issue relies on speculative memory disambiguation
- Assumes that there will be no conflicting store
- If the speculation is correct, you have issued the load much earlier and you have allowed the dependents to also execute much earlier
- If there is a conflicting store, you have to squash the load and all the dependents that have consumed the load value and re-execute them systematically
- Turns out that the speculation is correct most of the time
- To further minimize the load squash, microprocessors use simple memory dependence predictors (predicts if a load is going to conflict with a pending store based on that load's or load/store pairs' past behavior)
MLP and Memory Wall
- Today microprocessors try to hide cache misses by initiating early prefetches :
- Hardware prefetchers try to predict next several load addresses and initiate cache line prefetch if they are not already in the cache
- All processors today also support prefetch instructions; so you can specify in your program when to prefetch what: this gives much better control compared to a hardware prefetcher
- Researchers are working on load value prediction
- Even after doing all these, memory latency remains the biggest bottleneck
- Today microprocessors are trying to overcome one single wall: the memory wall
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|