|
Worse: False Sharing
- If the algorithm is designed so poorly that
- Two processors write to two different words within a cache line at the same time
- The cache line keeps on moving between two processors
- The processors are not really accessing or updating the same element, but whatever they are updating happen to fall within a cache line: not a true sharing, but false sharing
- For shared memory programs false sharing can easily degrade performance by a lot
- Easy to avoid: just pad up to the end of the cache line before starting the allocation of the data for the next processor (wastes memory, but improves performance)
Contention
- It is very easy to ignore contention effects when designing algorithms
- Can severely degrade performance by creating hot-spots
- Location hot-spot:
- Consider accumulating a global variable; the accumulation takes place on a single node i.e. all nodes access the variable allocated on that particular node whenever it tries to increment it
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|