Module 2: Virtual Memory and Caches
  Lecture 4: Cache Hierarchy and Memory-level Parallelism
 


Load/store Ordering

  • Out-of-order load issue relies on speculative memory disambiguation
    • Assumes that there will be no conflicting store
    • If the speculation is correct, you have issued the load much earlier and you have allowed the dependents to also execute much earlier
    • If there is a conflicting store, you have to squash the load and all the dependents that have consumed the load value and re-execute them systematically
    • Turns out that the speculation is correct most of the time
    • To further minimize the load squash, microprocessors use simple memory dependence predictors (predicts if a load is going to conflict with a pending store based on that load's or load/store pairs' past behavior)

MLP and Memory Wall

  • Today microprocessors try to hide cache misses by initiating early prefetches :
    • Hardware prefetchers try to predict next several load addresses and initiate cache line prefetch if they are not already in the cache
    • All processors today also support prefetch instructions; so you can specify in your program when to prefetch what: this gives much better control compared to a hardware prefetcher
  • Researchers are working on load value prediction
  • Even after doing all these, memory latency remains the biggest bottleneck
  • Today microprocessors are trying to overcome one single wall: the memory wall