Reference

March 4, 2019, Filed Under: Performance, Reference

Timing Methodology for MPI Programs

While working on the implementation of the MPI version of the STREAM benchmark, I realized that there were some subtleties in timing that could easily lead to inaccurate and/or misleading results. This post is a transcription of my notes as I looked at the issues…. Primary requirement: I want a… read more

September 17, 2018, Filed Under: Performance, Performance Counters, Reference

Using hardware performance counters to determine how often both logical processors are active on an Intel CPU

Most Intel microprocessors support “HyperThreading” (Intel’s trademark for their implementation of “simultaneous multithreading”) — which allows the hardware to support (typically) two “Logical Processors” for each physical core. Processes running on the two Logical Processors share most of the processor resources (particularly caches and execution units). Some workloads (particularly heterogeneous… read more

January 1, 2018, Filed Under: Cache Coherence Implementations, Cache Coherence Protocols, Computer Architecture, Computer Hardware, Reference

Notes on “non-temporal” (aka “streaming”) stores

Memory systems using caches have a lot more potential flexibility than most implementations are able to exploit – you get the standard behavior all the time, even if an alternative behavior would be allowable and desirable in a specific circumstance. One area in which many vendors have provided an alternative… read more