Module 4: "Recap: Virtual Memory and Caches"
  Lecture 8: "Cache Hierarchy and Memory-level Parallelism"
 

Cache hierarchy

  • Ideally want to hold everything in a fast cache
    • Never want to go to the memory
  • But, with increasing size the access time increases
  • A large cache will slow down every access
  • So, put increasingly bigger and slower caches between the processor and the memory
  • Keep the most recently used data in the nearest cache: register file (RF)
  • Next level of cache: level 1 or L1 (same speed or slightly slower than RF, but much bigger)
  • Then L2: way bigger than L1 and much slower
  • Example: Intel Pentium 4 (Netburst)
    • 128 registers accessible in 2 cycles
    • L1 date cache: 8 KB, 4-way set associative, 64 bytes line size, accessible in 2 cycles for integer loads
    • L2 cache: 256 KB, 8-way set associative, 128 bytes line size, accessible in 7 cycles
  • Example: Intel Itanium 2 (code name Madison)
    • 128 registers accessible in 1 cycle
    • L1 instruction and data caches: each 16 KB, 4-way set associative, 64 bytes line size, accessible in 1 cycle
    • Unified L2 cache: 256 KB, 8-way set associative, 128 bytes line size, accessible in 5 cycles
    • Unified L3 cache: 6 MB, 24-way set associative, 128 bytes line size, accessible in 14 cycles

States of a cache line

  • The life of a cache line starts off in invalid state (I)
  • An access to that line takes a cache miss and fetches the line from main memory
  • If it was a read miss the line is filled in shared state (S) [we will discuss it later; for now just assume that this is equivalent to a valid state]
  • In case of a store miss the line is filled in modified state (M); instruction cache lines do not normally enter the M state (no store to Icache)
  • The eviction of a line in M state must write the line back to the memory (this is called a writeback cache); otherwise the effect of the store would be lost

Inclusion policy

  • A cache hierarchy implements inclusion if the contents of level n cache (exclude the register file) is a subset of the contents of level n+1 cache
    • Eviction of a line from L2 must ask L1 caches (both instruction and data) to invalidate that line if present
    • A store miss fills the L2 cache line in M state, but the store really happens in L1 data cache; so L2 cache does not have the most up-to-date copy of the line
    • Eviction of an L1 line in M state writes back the line to L2
    • Eviction of an L2 line in M state first asks the L1 data cache to send the most up-to-date copy (if any), then it writes the line back to the next higher level (L3 or main memory)
    • Inclusion simplifies the on-chip coherence protocol (more later)