Objectives_template

	2D to 4D Conversion Essentially you need to change the way memory is allocated The matrix A needs to be allocated in such a way that the elements falling within a partition are contiguous The first two dimensions of the new 4D matrix are block row and column indices i.e. for the partition assigned to processor P 6 these are 1 and 2 respectively (assuming 16 processors) The next two dimensions hold the data elements within that partition Thus the 4D array may be declared as float B[vP][vP][N/vP][N/vP] The element B[3][2][5][10] corresponds to the element in 10 th column, 5 th row of the partition of P 14 Now all elements within a partition have contiguous addresses Transfer Granularity How much data do you transfer in one communication? For message passing it is explicit in the program For shared memory this is really under the control of the cache coherence protocol: there is a fixed size for which transactions are defined (normally the block size of the outermost level of cache hierarchy) In shared memory you have to be careful Since the minimum transfer size is a cache line you may end up transferring extra data e.g., in grid solver the elements of the left and right neighbors for a square block decomposition (you need only one element, but must transfer the whole cache line): no good solution