Objectives_template

Dataflow architecture

Express the program as a dataflow graph
Logical processor at each node is activated when both operands are available
- Mapping of logical nodes to PEs is specified by the program
On finishing an operation, a message or token is sent to the destination processor
Arriving tokens are matched against a token store and a match triggers the operation

Systolic arrays

Replace the pipeline within a sequential processor by an array of PEs
Each PE may have small instruction and data memory and may carry out a different operation
Data proceeds through the array at regular “heartbeats” (hence the name)
The dataflow may be multi-directional or optimized for specific algorithms
- Optimize the interconnect for specific application (not necessarily a linear topology)
Practical implementation in iWARP
- Uses general purpose processors as PEs
- Dedicated channels between PEs for direct register to register communication

A generic architecture

In all the architectures we have discussed thus far a node essentially contains processor(s) + caches, memory and a communication assist (CA)
- CA = network interface (NI) + communication controller
The nodes are connected over a scalable network
The main difference remains in the architecture of the CA
- And even under a particular programming model (e.g., shared memory) there is a lot of choices in the design of the CA
- Most innovations in parallel architecture take place in the communication assist (also called communication controller or node controller)