John McCalpin's blog – Page 3 – Dr. Bandwidth explains all….

May 27, 2021, Filed Under: Computer Architecture, Computer Hardware, Performance Counters

Die Locations of Cores and L3 Slices for Intel Xeon Processors

Intel provides nice schematic diagrams of the layouts of their processor chips, but provides no guidance on how the user-visible core numbers and L3 slice numbers map to the locations on the die. Most of the time there is no “need” to know the locations of the units, but there… read more

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.

April 2, 2020, Filed Under: Computer Architecture, Performance

The Surprising Effectiveness of Non-Overlapping, Sensitivity-Based Performance Models

This was a keynote presentation at the “2nd International Workshop on Performance Modeling: Methods and Applications” (PMMA16), June 23, 2016, Frankfurt, Germany (in conjunction with ISC16). The presentation discusses a family of simple performance models that I developed over the last 20 years — originally in support of processor and… read more

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.

March 4, 2019, Filed Under: Performance, Reference

Timing Methodology for MPI Programs

While working on the implementation of the MPI version of the STREAM benchmark, I realized that there were some subtleties in timing that could easily lead to inaccurate and/or misleading results. This post is a transcription of my notes as I looked at the issues…. Primary requirement: I want a… read more

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.