In a previous entry, I started discussing the issues related to memory bandwidth for a read-only kernel on a sample AMD Opteron system. The naive implementation gave a performance of 3.393 GB/s when compiled at “-O1” (hereafter “Version 001”) and 4.145 GB/s when compiled at “-O2” (hereafter “Version 002”). Today… read more