Objectives_template

	Cache hierarchy Ideally want to hold everything in a fast cache Never want to go to the memory But, with increasing size the access time increases A large cache will slow down every access So, put increasingly bigger and slower caches between the processor and the memory Keep the most recently used data in the nearest cache: register file (RF) Next level of cache: level 1 or L1 (same speed or slightly slower than RF, but much bigger) Then L2: way bigger than L1 and much slower Example: Intel Pentium 4 (Netburst) 128 registers accessible in 2 cycles L1 date cache: 8 KB, 4-way set associative, 64 bytes line size, accessible in 2 cycles for integer loads L2 cache: 256 KB, 8-way set associative, 128 bytes line size, accessible in 7 cycles Example: Intel Itanium 2 (code name Madison) 128 registers accessible in 1 cycle L1 instruction and data caches: each 16 KB, 4-way set associative, 64 bytes line size, accessible in 1 cycle Unified L2 cache: 256 KB, 8-way set associative, 128 bytes line size, accessible in 5 cycles Unified L3 cache: 6 MB, 24-way set associative, 128 bytes line size, accessible in 14 cycles States of a cache line The life of a cache line starts off in invalid state (I) An access to that line takes a cache miss and fetches the line from main memory If it was a read miss the line is filled in shared state (S) [we will discuss it later; for now just assume that this is equivalent to a valid state] In case of a store miss the line is filled in modified state (M); instruction cache lines do not normally enter the M state (no store to Icache) The eviction of a line in M state must write the line back to the memory (this is called a writeback cache); otherwise the effect of the store would be lost Inclusion policy A cache hierarchy implements inclusion if the contents of level n cache (exclude the register file) is a subset of the contents of level n+1 cache Eviction of a line from L2 must ask L1 caches (both instruction and data) to invalidate that line if present A store miss fills the L2 cache line in M state, but the store really happens in L1 data cache; so L2 cache does not have the most up-to-date copy of the line Eviction of an L1 line in M state writes back the line to L2 Eviction of an L2 line in M state first asks the L1 data cache to send the most up-to-date copy (if any), then it writes the line back to the next higher level (L3 or main memory) Inclusion simplifies the on-chip coherence protocol (more later)