Module 17: "Interconnection Networks"
  Lecture 38: "Routing Algorithms"
 

Alpha 21364 μP

Reproduced from IEEE Micro

Alpha 21364 router

  • Integrated on-chip router
    • Supports adaptive routing over a 2D torus
    • Clocked at 1.2 GHz and has a 13-cycle routing delay from input to output port (10.8 ns)
    • Eight input and seven output ports
    • Input ports: NEWS, two memory controllers, cache, I/O
    • Output ports: NEWS, L1, L2, I/O
    • Each port has 3.2 GB/s link bandwidth (links are clocked at 0.8 GHz)
    • Flit size is 39 bits: 32 bits of data/control, 7 bits of ECC
    • Seven coherence message classes: Request (3 flits), Forwarded request (3 flits), Data response (18 or 19 flits), Dataless response (2 or 3 flits), I/O write (19 flits), I/O read (3 flits), Special (1 or 3 flits; mostly used for flow control)
  • Implements virtual cut-through adaptive routing
    • Blocking router buffers all flits and it is its responsible to restart the routing later; router can buffer 316 messages
    • Simple partial adaptive routing: two choices for each input flit; either continue in same dimension (i.e. east to west or north to south) or take a turn within the minimal rectangle containing the source and destination
    • Preference is given for continuing in the same dimension
    • Deadlock avoidance in coherence protocol: seven distinct virtual networks are provided for seven message classes
    • Deadlock avoidance in routing: three virtual channels within each virtual network; two for deadlock-free torus routing and one for adaptive routing
  • The router pipeline
    • Input and output ports are divided into three types each: local (cache and memory controller), inter-processor (NEWS), I/O
    • Depending on input and output ports of a packet it goes through one of the nine logically different routing pipelines (implemented in a single seven-stage pipe)
    • Six additional cycles are needed to account for synchronization between router’s internal clock and external link clock (runs at 0.8 GHz), pad receiver and driver delay, transport delay from pin (inbound) to router and from router to pin (outbound)
    • Uses table-based routing to select output port and virtual channel (one 128-entry routing table and one virtual channel table giving assignments for three output virtual channels)