Single Thread, Read Only Results Comparison Across Systems In Part1, Part2, Part3, and Part4, I reviewed performance issues for a single-thread program executing a long vector sum-reduction — a single-array read-only computational kernel — on a 2-socket system with a pair of AMD Family10h Opteron Revision C2 (“Shanghai”) quad-core processors.… read more
Archives for November 2010
Optimizing AMD Opteron Memory Bandwidth, Part 4: single-thread, read-only
Following up on Part 1 and Part 2, and Part 3, it is time to into the ugly stuff — trying to control DRAM bank and rank access patterns and working to improve the effectiveness of the memory controller prefetcher. Background: Banks and Ranks The DRAM installed in the system… read more
Optimizing AMD Opteron Memory Bandwidth, Part 3: single-thread, read-only
Following up on Part 1 and Part 2, it is time to look at adding explicit prefetching to try to increase read bandwidth. About Prefetching The AMD Opteron Family10h processors have two different “hardware” prefetch mechanisms, and also allow “software” prefetch instructions. The “core prefetcher” is (as the name implies)… read more