|
Flexible protocol engine
- Software protocol
- Executes short sequences of instructions or micro-code known as protocol handlers on a processor
- Each message type has a separate handler
- Can make the protocol complicated
- Allows late-binding of protocol, can choose appropriate protocol, easier verification path
- Normally higher occupancy than hardwired controllers if controller clock is slow
- Protocol processor may use separate protocol data and code caches to speed up protocol processing
- Four existing designs
- Customized coprocessor embedded in memory controller
- ISA designed to include bit field operations: helpful for directory manipulation (bit clear, bit set, branch on bit clear, branch on bit set, find first set bit, etc.)
- Processor is normally simple e.g. short pipeline, in-order, no fp unit or mult/div
- Example: Stanford FLASH, Sun S3.mp, Alpha Piranha CMP, Sequent STiNG, Sequent NUMA-Q
- Four existing designs
- General purpose processor embedded in memory controller
- Uses commodity processor cores
- May be wasteful of resources
- Normally higher occupancy than customized coprocessor if memory clock is slow
- Example: Wisconsin Typhoon
- Four existing designs
- Execute on main processor
- Interrupt the main processor to execute coherence protocol on cache miss or network message arrival
- Needs an extremely low overhead interrupt mechanism to be competitive
- Grahn and Stenstrom (1995)
- Four existing designs
- Execute on spare hardware thread context of multi-threaded (or hyper-threaded) processors
- No interrupt overhead
- Reserve a protocol thread context
- Application and protocol threads co-exist in the processor (no context switch needed)
- Chaudhuri and Heinrich (2004)
- Can’t discuss in detail before talking about SMT/HT
- Possible future design
- Devote a core to protocol processing in multi-core architectures (Kalamkar, Chaudhuri, and Heinrich, 2007)
- Increasingly attractive as number of cores increases
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|