While working on the implementation of the MPI version of the STREAM benchmark, I realized that there were some subtleties in timing that could easily lead to inaccurate and/or misleading results. This post is a transcription of my notes as I looked at the issues…. Primary requirement: I want a… read more
Reference
Using hardware performance counters to determine how often both logical processors are active on an Intel CPU
Most Intel microprocessors support “HyperThreading” (Intel’s trademark for their implementation of “simultaneous multithreading”) — which allows the hardware to support (typically) two “Logical Processors” for each physical core. Processes running on the two Logical Processors share most of the processor resources (particularly caches and execution units). Some workloads (particularly heterogeneous… read more
Notes on “non-temporal” (aka “streaming”) stores
Memory systems using caches have a lot more potential flexibility than most implementations are able to exploit – you get the standard behavior all the time, even if an alternative behavior would be allowable and desirable in a specific circumstance. One area in which many vendors have provided an alternative… read more