Module 18: "TLP on Chip: HT/SMT and CMP"
  Lecture 40: "Case Studies: IBM Power4 and IBM Power5"
 

IBM POWER5

  • Thread priority
    • Software can set priority of a thread and the hardware (essentially the decoder) reads these priority registers to decide which thread to process in a given cycle
    • Higher priority thread gets more decode cycles in the long run i.e. injects more instructions into the pipe
    • Eight priority levels for each thread: level 0 means idle
    • Real time tasks get higher priority while a thread looping on a spin-lock will get lower priority
    • Level 1 is the lowest priority for an active thread; if both threads are running at level 1 the processor throttles the overall decode rate to save dynamic power
  • Adaptive resource balancing
    • Mainly three hardware mechanisms used by POWER5 to make sure that one thread is not hogging too much
    • If one thread is found to consume too many GCT entries i.e. has too many in-flight instructions (one GCT entry is at most 5 instructions), that thread will get less decode cycles until GCT occupancy reaches a balanced state (note the difference with ICOUNT)
    • If a thread has too many outstanding L2 cache misses, that thread will be given less decode cycles (why?)
    • If a thread is executing a sync, all instructions belonging to that thread that are waiting in the pipe at the dispatch stage will be flushed and fetching from that thread will be inhibited until sync finishes (why?)
  • Dynamic power management
    • With SMT and CMP average number of switching per cycle increases leading to more power consumption
    • Need to reduce power consumption without losing performance: simple solution is to clock it at a slower frequency, but that hurts performance
    • POWER5 employs fine-grain clock-gating: in every cycle the power management logic decides if a certain latch will be used in the next cycle; if not, it disables or gates the clock for that latch so that it will not unnecessarily switch in the next cycle
    • Clock-gating and power management logic themselves should be very simple
    • If both threads are running at priority level 1, the processor switches to a low power mode where it dispatches instructions at a much slower pace

POWER5 die photo