Here are the annotated slides from my SC18 presentation on Snoop Filter Conflicts that cause performance variability in HPL and DGEMM on the Xeon Platinum 8160 processor. This slide presentation includes data (not included in the paper) showing that Snoop Filter Conflicts occur in all Intel Scalable Processors (a.k.a., “Skylake… read more
Computer Architecture
Why I hate MPI (from a performance analysis perspective)
According to Dr. Bandwidth, performance analysis has two recurring themes: How fast should this code (or “simple” variations on this code) run on this hardware? If I am analyzing (apparent) performance shortfalls, how can I distinguish between cause and effect? For very simple codes, it may be possible to do… read more
Comments on timing short code sections on Intel processors
(From a recent post of mine on the Intel software developer forums — some potentially useful words to go along with my new low-overhead-timers project…) Updates on 2019-01-23 in blue. There are lots of topics that you need to be aware of when attempting fine-grain timing. A few of the… read more