Module 14: "Directory-based Cache Coherence"
  Lecture 31: "Managing Directory Overhead"
 

Page migration

  • Page migration changes the existing VA to PA mapping of the migrated page
    • Requires notifying all TLBs caching the old mapping
    • Introduces a TLB coherence problem
  • Origin 2000 uses a smart page migration algorithm: allows the page copy and TLB shootdown to proceed in parallel
    • Array of 64 page reference counters per directory entry to decide whether to migrate a page or not: compare requester’s counter against home’s and send an interrupt to home if migration is required
  • What does the interrupt handler do?
    • Access all directory entries of the lines belonging to the to-be migrated page
    • Send invalidations to sharers or interventions to owners; at the end all cache lines of that page must be in memory
    • Set the poison bits in the directory entries of all the cache lines of the page
    • Start a block transfer of the page from home to requester at this point (30 μs to copy 16 KB)
  • An access to a poisoned cache line from a node results in a bus error which invalidates the TLB entry for that page in the requesting node (avoids broadcast shootdown)
  • Until the page is completely migrated and is assigned a physical page frame on target node, all nodes accessing a poisoned line wait in a pending queue
  • After the page copy is completed the waiting nodes are served one by one; however, the directory entries and the page itself are moved to a “poisoned list” and are not yet freed at the home (i.e. you still cannot use that physical page frame)
  • On every scheduler tick the kernel invalidates one TLB entry per processor
  • After a time equal to TLB entries per processor multiplied by scheduling quantum the page frame is marked free and is removed from the poisoned list
  • Major advantage: requesting nodes only see the page copy latency including invalidation and interventions in critical path, but not the TLB shootdown latency

Queue lock in hardware

  • Stanford DASH
    • Memory controller recognizes lock accesses
      • Requires changes in compiler and instruction set
    • Marks the directory entry with contenders
    • On unlock a contender is chosen and lock is granted to that node
    • Unlock is forced to generate a notification message to home
      • Possibly requires special cache state for lock variables or special uncached instructions for unlock if lock variables are not allowed to be cached