arithmetic

January 22, 2018, Filed Under: Computer Hardware, Performance, Performance Counters

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing) Introduction: In December 2017, my colleague Damon McDougall (now at AMD) asked for help in porting the fused multiply-add example code from a Colfax report (https://colfaxresearch.com/skl-avx512/) to the Xeon Phi x200 (Knights Landing) processors here at TACC. There was… read more

November 5, 2016, Filed Under: Algorithms, Computer Architecture, Computer Hardware, Performance

Intel discloses “vector+SIMD” instructions for future processors

The art and science of microprocessor architecture is a never-ending struggling to balance complexity, verifiability, usability, expressiveness, compactness, ease of encoding/decoding, energy consumption, backwards compatibility, forwards compatibility, and other factors. In recent years the trend has been to increase core-level performance by the use of SIMD vector instructions, and… read more

September 11, 2014, Filed Under: Algorithms, Performance

Memory Bandwidth Requirements of the HPL benchmark

The High Performance LINPACK (HPL) benchmark is well known for delivering a high fraction of peak floating-point performance. The (historically) excellent scaling of performance as the number of processors is increased and as the frequency is increased suggests that memory bandwidth has not been a performance limiter. But this does… read more