The art and science of microprocessor architecture is a never-ending struggling to balance complexity, verifiability, usability, expressiveness, compactness, ease of encoding/decoding, energy consumption, backwards compatibility, forwards compatibility, and other factors. In recent years the trend has been to increase core-level performance by the use of SIMD vector instructions, and… read more
Algorithms
Memory Bandwidth Requirements of the HPL benchmark
The High Performance LINPACK (HPL) benchmark is well known for delivering a high fraction of peak floating-point performance. The (historically) excellent scaling of performance as the number of processors is increased and as the frequency is increased suggests that memory bandwidth has not been a performance limiter. But this does… read more
Is “ordered summation” a hard problem to speed up?
Sometimes things that seem incredibly difficult aren’t really that bad…. I have been reviewing technology challenges for “exascale” computing and ran across an interesting comment in the 2008 “Technology Challenges in Achieving Exascale Systems” report. In Section 5.8 “Application Assessments”, Figure 5.16 on page 82 places “Ordered Summation” in the… read more