Module 2: Virtual Memory and Caches
  Lecture 4: Cache Hierarchy and Memory-level Parallelism
 


Cache Hierarchy

  • Ideally want to hold everything in a fast cache
    • Never want to go to the memory
  • But, with increasing size the access time increases
  • A large cache will slow down every access
  • So, put increasingly bigger and slower caches between the processor and the memory
  • Keep the most recently used data in the nearest cache: register file (RF)
  • Next level of cache: level 1 or L1 (same speed or slightly slower than RF, but much bigger)
  • Then L2: way bigger than L1 and much slower
  • Example: Intel Pentium 4 ( Netburst )
    • 128 registers accessible in 2 cycles
    • L1 date cache: 8 KB, 4-way set associative, 64 bytes line size, accessible in 2 cycles for integer loads
    • L2 cache: 256 KB, 8-way set associative, 128 bytes line size, accessible in 7 cycles
  • Example: Intel Itanium 2 (code name Madison)
    • 128 registers accessible in 1 cycle
    • L1 instruction and data caches: each 16 KB, 4-way set associative, 64 bytes line size, accessible in 1 cycle
    • Unified L2 cache: 256 KB, 8-way set associative, 128 bytes line size, accessible in 5 cycles
    • Unified L3 cache: 6 MB, 24-way set associative, 128 bytes line size, accessible in 14 cycles