The High Performance LINPACK (HPL) benchmark is well known for delivering a high fraction of peak floating-point performance. The (historically) excellent scaling of performance as the number of processors is increased and as the frequency is increased suggests that memory bandwidth has not been a performance limiter. But this does… read more
Performance
Counting Stall Cycles on the Intel Sandy Bridge Processor
Intuition might suggest that defining what a “stall cycle” is on a processor should be relatively straightforward. For some processors, this is actually the case — particularly in-order processors with a very small number of execution units and a very small number of non-pipelined instructions. For modern out-of-order processors, coming… read more
Memory Bandwidth on Xeon Phi (Knights Corner)
A Quick Note There are a lot of topics that could be addressed here, but this short note will focus on bandwidth from main memory (using the STREAM benchmark) as a function of the number of threads used. Published STREAM Bandwidth Results Official STREAM submission at: http://www.cs.virginia.edu/stream/stream_mail/2013/0015.html Compiled with icc… read more