I am often asked what “Large Pages” in computer systems are good for. For commodity (x86_64) processors, “small pages” are 4KiB, while “large pages” are (typically) 2MiB. The size of the page controls how many bits are translated between virtual and physical addresses, and so represent a trade-off between what… read more
memory bandwidth
Optimizing AMD Opteron Memory Bandwidth, Part 5: single-thread, read-only
Single Thread, Read Only Results Comparison Across Systems In Part1, Part2, Part3, and Part4, I reviewed performance issues for a single-thread program executing a long vector sum-reduction — a single-array read-only computational kernel — on a 2-socket system with a pair of AMD Family10h Opteron Revision C2 (“Shanghai”) quad-core processors.… read more
Optimizing AMD Opteron Memory Bandwidth, Part 4: single-thread, read-only
Following up on Part 1 and Part 2, and Part 3, it is time to into the ugly stuff — trying to control DRAM bank and rank access patterns and working to improve the effectiveness of the memory controller prefetcher. Background: Banks and Ranks The DRAM installed in the system… read more