|
Cache hierarchy
- Ideally want to hold everything in a fast cache
- Never want to go to the memory
- But, with increasing size the access time increases
- A large cache will slow down every access
- So, put increasingly bigger and slower caches between the processor and the memory
- Keep the most recently used data in the nearest cache: register file (RF)
- Next level of cache: level 1 or L1 (same speed or slightly slower than RF, but much bigger)
- Then L2: way bigger than L1 and much slower
- Example: Intel Pentium 4 (Netburst)
- 128 registers accessible in 2 cycles
- L1 date cache: 8 KB, 4-way set associative, 64 bytes line size, accessible in 2 cycles for integer loads
- L2 cache: 256 KB, 8-way set associative, 128 bytes line size, accessible in 7 cycles
- Example: Intel Itanium 2 (code name Madison)
- 128 registers accessible in 1 cycle
- L1 instruction and data caches: each 16 KB, 4-way set associative, 64 bytes line size, accessible in 1 cycle
- Unified L2 cache: 256 KB, 8-way set associative, 128 bytes line size, accessible in 5 cycles
- Unified L3 cache: 6 MB, 24-way set associative, 128 bytes line size, accessible in 14 cycles
States of a cache line
- The life of a cache line starts off in invalid state (I)
- An access to that line takes a cache miss and fetches the line from main memory
- If it was a read miss the line is filled in shared state (S) [we will discuss it later; for now just assume that this is equivalent to a valid state]
- In case of a store miss the line is filled in modified state (M); instruction cache lines do not normally enter the M state (no store to Icache)
- The eviction of a line in M state must write the line back to the memory (this is called a writeback cache); otherwise the effect of the store would be lost
Inclusion policy
- A cache hierarchy implements inclusion if the contents of level n cache (exclude the register file) is a subset of the contents of level n+1 cache
- Eviction of a line from L2 must ask L1 caches (both instruction and data) to invalidate that line if present
- A store miss fills the L2 cache line in M state, but the store really happens in L1 data cache; so L2 cache does not have the most up-to-date copy of the line
- Eviction of an L1 line in M state writes back the line to L2
- Eviction of an L2 line in M state first asks the L1 data cache to send the most up-to-date copy (if any), then it writes the line back to the next higher level (L3 or main memory)
- Inclusion simplifies the on-chip coherence protocol (more later)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|