Following up on Part 1 and Part 2, it is time to look at adding explicit prefetching to try to increase read bandwidth. About Prefetching The AMD Opteron Family10h processors have two different “hardware” prefetch mechanisms, and also allow “software” prefetch instructions. The “core prefetcher” is (as the name implies)… read more
memory bandwidth
Optimizing AMD Opteron Memory Bandwidth, Part 2: single-thread, read-only
In a previous entry, I started discussing the issues related to memory bandwidth for a read-only kernel on a sample AMD Opteron system. The naive implementation gave a performance of 3.393 GB/s when compiled at “-O1” (hereafter “Version 001”) and 4.145 GB/s when compiled at “-O2” (hereafter “Version 002”). Today… read more