Objectives_template

	The first instruction Accessing the first instruction Take the starting PC Access iTLB with the VPN extracted from PC: iTLB miss Invoke iTLB miss handler Calculate PTE address If PTEs are cached in L1 data and L2 caches, look them up with PTE address: you will miss there also Access page table in main memory: PTE is invalid: page fault Invoke page fault handler Allocate page frame, read page from disk, update PTE, load PTE in iTLB, restart fetch Now you have the physical address Access Icache: miss Send refill request to higher levels: you miss everywhere Send request to memory controller (north bridge) Access main memory Read cache line Refill all levels of cache as the cache line returns to the processor Extract the appropriate instruction from the cache line with the block offset This is the longest possible latency in an instruction/data access TLB access For every cache access (instruction or data) you need to access the TLB first Puts the TLB in the critical path Want to start indexing into cache and read the tags while TLB lookup takes place Virtually indexed physically tagged cache Extract index from the VA, start reading tag while looking up TLB Once the PA is available do tag comparison Overlaps TLB reading and tag reading Memory op latency L1 hit: ~1 ns L2 hit: ~5 ns L3 hit: ~10-15 ns Main memory: ~70 ns DRAM access time + bus transfer etc. = ~110-120 ns If a load misses in all caches it will eventually come to the head of the ROB and block instruction retirement (in-order retirement is a must) Gradually, the pipeline backs up, processor runs out of resources such as ROB entries and physical registers Ultimately, the fetcher stalls: severely limits ILP