(These are old notes for relatively old systems — I just found this in my “drafts” folder and decided to switch the status to “public” so I can find it again!) Some notes on how to determine the DRAM and Memory Controller configuration for a system using AMD Opteron/Phenom or… read more
Memory Bandwidth Requirements of the HPL benchmark
The High Performance LINPACK (HPL) benchmark is well known for delivering a high fraction of peak floating-point performance. The (historically) excellent scaling of performance as the number of processors is increased and as the frequency is increased suggests that memory bandwidth has not been a performance limiter. But this does… read more
Counting Stall Cycles on the Intel Sandy Bridge Processor
Intuition might suggest that defining what a “stall cycle” is on a processor should be relatively straightforward. For some processors, this is actually the case — particularly in-order processors with a very small number of execution units and a very small number of non-pipelined instructions. For modern out-of-order processors, coming… read more