|
The first instruction
- Accessing the first instruction
- Take the starting PC
- Access iTLB with the VPN extracted from PC: iTLB miss
- Invoke iTLB miss handler
- Calculate PTE address
- If PTEs are cached in L1 data and L2 caches, look them up with PTE address: you will miss there also
- Access page table in main memory: PTE is invalid: page fault
- Invoke page fault handler
- Allocate page frame, read page from disk, update PTE, load PTE in iTLB, restart fetch
- Now you have the physical address
- Access Icache: miss
- Send refill request to higher levels: you miss everywhere
- Send request to memory controller (north bridge)
- Access main memory
- Read cache line
- Refill all levels of cache as the cache line returns to the processor
- Extract the appropriate instruction from the cache line with the block offset
- This is the longest possible latency in an instruction/data access
TLB access
- For every cache access (instruction or data) you need to access the TLB first
- Puts the TLB in the critical path
- Want to start indexing into cache and read the tags while TLB lookup takes place
- Virtually indexed physically tagged cache
- Extract index from the VA, start reading tag while looking up TLB
- Once the PA is available do tag comparison
- Overlaps TLB reading and tag reading
Memory op latency
- L1 hit: ~1 ns
- L2 hit: ~5 ns
- L3 hit: ~10-15 ns
- Main memory: ~70 ns DRAM access time + bus transfer etc. = ~110-120 ns
- If a load misses in all caches it will eventually come to the head of the ROB and block instruction retirement (in-order retirement is a must)
- Gradually, the pipeline backs up, processor runs out of resources such as ROB entries and physical registers
- Ultimately, the fetcher stalls: severely limits ILP
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|