This was a keynote presentation at the “2nd International Workshop on Performance Modeling: Methods and Applications” (PMMA16), June 23, 2016, Frankfurt, Germany (in conjunction with ISC16). The presentation discusses a family of simple performance models that I developed over the last 20 years — originally in support of processor and… read more
Timing Methodology for MPI Programs
While working on the implementation of the MPI version of the STREAM benchmark, I realized that there were some subtleties in timing that could easily lead to inaccurate and/or misleading results. This post is a transcription of my notes as I looked at the issues…. Primary requirement: I want a… read more
Intel’s future “CLDEMOTE” instruction
I recently saw a reference to a future Intel “Atom” core called “Tremont” and ran across an interesting new instruction, “CLDEMOTE”, that will be supported in “Future Tremont and later” microarchitectures (ref: “Intel® Architecture Instruction Set Extensions and Future Features Programming Reference”, document 319433-035, October 2018). The “CLDEMOTE” instruction is… read more