John McCalpin's blog – Page 10 – Dr. Bandwidth explains all….

December 5, 2013, Filed Under: Computer Architecture, Performance

Memory Bandwidth on Xeon Phi (Knights Corner)

A Quick Note There are a lot of topics that could be addressed here, but this short note will focus on bandwidth from main memory (using the STREAM benchmark) as a function of the number of threads used. Published STREAM Bandwidth Results Official STREAM submission at: http://www.cs.virginia.edu/stream/stream_mail/2013/0015.html Compiled with icc… read more

July 14, 2013, Filed Under: Computer Hardware, Performance, Performance Counters

Notes on the mystery of hardware cache performance counters

In response to a question on the PAPI mailing list, I scribbled some notes to try to help users understand the complexity of hardware performance counters for cache accesses and cache misses, and thought they might be helpful here…. For any interpretation of specific hardware performance counter events, it is… read more

May 30, 2013, Filed Under: Accelerated Computing, Computer Hardware, Linux

Coherence with Cached Memory-Mapped IO

In response to my previous blog entry, a question was asked about how to manage coherence for cached memory-mapped IO regions. Here are some more details… Maintaining Coherence with Cached Memory-Mapped IO For the “read-only” range, cached copies of MMIO lines will never be invalidated by external traffic, so repeated… read more