Performance Counters

July 23, 2018, Filed Under: Computer Architecture, Performance, Performance Counters

Comments on timing short code sections on Intel processors

(From a recent post of mine on the Intel software developer forums — some potentially useful words to go along with my new low-overhead-timers project…) Updates on 2019-01-23 in blue. There are lots of topics that you need to be aware of when attempting fine-grain timing. A few of the… read more

January 22, 2018, Filed Under: Computer Hardware, Performance, Performance Counters

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing) Introduction: In December 2017, my colleague Damon McDougall (now at AMD) asked for help in porting the fused multiply-add example code from a Colfax report (https://colfaxresearch.com/skl-avx512/) to the Xeon Phi x200 (Knights Landing) processors here at TACC. There was… read more

June 4, 2014, Filed Under: Computer Architecture, Performance, Performance Counters

Counting Stall Cycles on the Intel Sandy Bridge Processor

Intuition might suggest that defining what a “stall cycle” is on a processor should be relatively straightforward. For some processors, this is actually the case — particularly in-order processors with a very small number of execution units and a very small number of non-pipelined instructions. For modern out-of-order processors, coming… read more

About