Exploiting TLP: NOW
- Simplest solution: take the commodity boxes, connect them over gigabit ethernet and let them talk via messages
- The simplest possible message-passing machine
- Also known as Network of Workstations (NOW)
- Normally PVM (Parallel Virtual Machine) or MPI (Message Passing Interface) is used for programming
- Each processor sees only local memory
- Any remote data access must happen through explicit messages (send/recv calls trapping into kernel)
- Optimizations in the messaging layer are possible (user level messages, active messages)
Supercomputers
- Historically used for scientific computing
- Initially used vector processors
- But uniprocessor performance gap of vector processors and microprocessors is narrowing down
- Microprocessors now have heavily pipelined floating-point units, large on-chip caches, modern techniques to extract ILP
- Microprocessor based supercomputers come in large-scale: 100 to 1000 (called massively parallel processors or MPPs)
- However, vector processor based supercomputers are much smaller scale due to cost disadvantage
- Cray finally decided to use Alpha µP in T3D
Exploiting TLP: Shared memory
- Hard to build, but offers better programmability compared to message-passing clusters
- The “conventional” load/store architecture continues to work
- Communication takes place through load/store instructions
- Central to design: a cache coherence protocol
- Handling data coherency among different caches
- Special care needed for synchronization
|