A reader of this site asked me if I had a detailed breakdown of the components of memory latency for a modern microprocessor-based system. Since the only real data I have is confidential/proprietary and obsolete, I decided to try to build up a latency equation from memory…. Preliminary Comments: It… read more
Optimizing AMD Opteron Memory Bandwidth, Part 5: single-thread, read-only
Single Thread, Read Only Results Comparison Across Systems In Part1, Part2, Part3, and Part4, I reviewed performance issues for a single-thread program executing a long vector sum-reduction — a single-array read-only computational kernel — on a 2-socket system with a pair of AMD Family10h Opteron Revision C2 (“Shanghai”) quad-core processors.… read more
Optimizing AMD Opteron Memory Bandwidth, Part 4: single-thread, read-only
Following up on Part 1 and Part 2, and Part 3, it is time to into the ugly stuff — trying to control DRAM bank and rank access patterns and working to improve the effectiveness of the memory controller prefetcher. Background: Banks and Ranks The DRAM installed in the system… read more