Objectives_template

	Exploiting TLP: NOW Simplest solution: take the commodity boxes, connect them over gigabit ethernet and let them talk via messages The simplest possible message-passing machine Also known as Network of Workstations (NOW) Normally PVM (Parallel Virtual Machine) or MPI (Message Passing Interface) is used for programming Each processor sees only local memory Any remote data access must happen through explicit messages (send/recv calls trapping into kernel) Optimizations in the messaging layer are possible (user level messages, active messages) Supercomputers Historically used for scientific computing Initially used vector processors But uniprocessor performance gap of vector processors and microprocessors is narrowing down Microprocessors now have heavily pipelined floating-point units, large on-chip caches, modern techniques to extract ILP Microprocessor based supercomputers come in large-scale: 100 to 1000 (called massively parallel processors or MPPs) However, vector processor based supercomputers are much smaller scale due to cost disadvantage Cray finally decided to use Alpha µP in T3D Exploiting TLP: Shared memory Hard to build, but offers better programmability compared to message-passing clusters The “conventional” load/store architecture continues to work Communication takes place through load/store instructions Central to design: a cache coherence protocol Handling data coherency among different caches Special care needed for synchronization