Welcome to the University of Texas blog of John D. McCalpin, PhD — aka Dr. Bandwidth.
This is the beginning of a serious of posts on the exciting topic of memory bandwidth in computer systems! I hope these posts will serve as a reference for folks interested in the increasingly complex issues regarding memory bandwidth — it seems silly for me to write all this stuff down for my own use when I suspect that there is at least one person out there who may find this information useful.
Although I currently work at the Texas Advanced Computing Center of the University of Texas at Austin, my professional career has been split about 50-50 between academia and the computer hardware industry. These blog posts will primarily deal with concrete information about real systems that (in my experience) does not fit well with the priorities of most academic publications. Most journals or conferences require some level of “research” or “novelty”, while most of these notes are more explanatory in nature. That is not to say that none of this will ever be published, but much of the material to be posted here is valuable for other reasons.
My interest in memory bandwidth extends back to the late 1980’s, leading to the development of the STREAM Benchmark which I have maintained since 1991. Here is what I have to say about STREAM:
What is STREAM?
The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels.
Why should I care?
Computer CPUs are getting faster much more quickly than computer memory systems. As this progresses, more and more programs will be limited in performance by the memory bandwidth of the system, rather than by the computational performance of the CPU.
As an extreme example, most current high-end machines run simple arithmetic kernels for out-of-cache operands at 1-2% of their rated peak speeds — that means that they are spending 98-99% of their time idle and waiting for cache misses to be satisfied.
The STREAM benchmark is specifically designed to work with datasets much larger than the available cache on any given system, so that the results are (presumably) more indicative of the performance of very large, vector style applications.