Intel provides nice schematic diagrams of the layouts of their processor chips, but provides no guidance on how the user-visible core numbers and L3 slice numbers map to the locations on the die. Most of the time there is no “need” to know the locations of the units, but there… read more
The Surprising Effectiveness of Non-Overlapping, Sensitivity-Based Performance Models
This was a keynote presentation at the “2nd International Workshop on Performance Modeling: Methods and Applications” (PMMA16), June 23, 2016, Frankfurt, Germany (in conjunction with ISC16). The presentation discusses a family of simple performance models that I developed over the last 20 years — originally in support of processor and… read more
Timing Methodology for MPI Programs
While working on the implementation of the MPI version of the STREAM benchmark, I realized that there were some subtleties in timing that could easily lead to inaccurate and/or misleading results. This post is a transcription of my notes as I looked at the issues…. Primary requirement: I want a… read more