cache

September 11, 2014, Filed Under: Algorithms, Performance

Memory Bandwidth Requirements of the HPL benchmark

The High Performance LINPACK (HPL) benchmark is well known for delivering a high fraction of peak floating-point performance. The (historically) excellent scaling of performance as the number of processors is increased and as the frequency is increased suggests that memory bandwidth has not been a performance limiter. But this does… read more

July 14, 2013, Filed Under: Computer Hardware, Performance, Performance Counters

Notes on the mystery of hardware cache performance counters

In response to a question on the PAPI mailing list, I scribbled some notes to try to help users understand the complexity of hardware performance counters for cache accesses and cache misses, and thought they might be helpful here…. For any interpretation of specific hardware performance counter events, it is… read more

May 29, 2013, Filed Under: Accelerated Computing, Computer Hardware, Linux

Notes on Cached Access to Memory-Mapped IO Regions

When attempting to build heterogeneous computers with “accelerators” or “coprocessors” on PCIe interfaces, one quickly runs into asymmetries between the data transfer capabilities of processors and IO devices. These asymmetries are often surprising — the tremendously complex processor is actually less capable of generating precisely controlled high-performance IO transactions than… read more