• Skip to main content
  • Skip to primary sidebar
UT Shield
The University of Texas at Austin

January 5, 2013, Filed Under: Performance

Counting binary vs decimal powers in the STREAM benchmark

A question came up recently about my choice of definitions for “MB” used in the computation of memory bandwidth (in “MB/s”) in the STREAM benchmark.

According to this reference from NIST, the convention is:

Binary Powers Value abbreviation full name
2^10 1,024 KiB kibibyte
2^20 1,048,576 MiB mebibyte
2^30 1,073,741,824 GiB gibibyte
Decimal Powers Value abbreviation full name
10^3 1,000 kB kilobyte
10^6 1,000,000 MB megabyte
10^9 1,000,000,000 GB gigabyte

Since its inception in 1991, the STREAM benchmark has reported the amount of memory used in MiB (2^20) and (more recently) GiB (2^30), but always reports the transfer rates in MB/s (10^6).

An example may make my motivation more clear.

Suppose a computer system reads 524,288 Bytes and writes 524,288 Bytes in 1.000 seconds, for a total of 1,048,576 Bytes transferred in 1.000 seconds.
The corresponding performance could be reported in a variety of ways:

  • Option 1: report as 1,048,576 Bytes/s
  • Option 2: report as 1.000000 MiB/s
  • Option 3: report as 1.049 MB/s

From my perspective:

  • Option 1 gives inconveniently large numbers.
  • Option 2 is consistent with typical units for memory storage, but:
    • it is not consistent with typical units for counting arithmetic operations (more on that below), and
    • it would allow unscrupulous parties (or simply parties with different opinions about how to “properly” count) to change the definition of “MB” from 2^20 to 10^6, allowing them to report values that were almost 4.9% higher than the *same* performance on other systems.
  • Option 3 is what I chose. It is consistent with how FLOPS are counted and it preempts the potential “performance inflation” from abusing Option 2.

Note that if floating-point arithmetic operation counts define “MFLOPS” as 10^6 FP Ops/s (as is typical), then “balance” ratios of (MB/s)/MFLOPS require that (MB/s) also be defined using a decimal base.
These “balance” ratios are an important output of the STREAM benchmark project.
(Aside: I would not encourage anyone to consider a 5% difference in “balance” to mean very much — these are intended as relatively coarse scaling estimates.)

Primary Sidebar

Recent Posts

  • Single-core memory bandwidth: Latency, Bandwidth, and Concurrency
  • Dr. Bandwidth is moving on…
  • The evolution of single-core bandwidth in multicore systems — update
  • “Memory directories” in Intel processors
  • The evolution of single-core bandwidth in multicore processors

Tags

accelerated computing arithmetic cache communication configuration coprocessor Distributed cache DRAM Hash functions high performance computing Knights Landing memory bandwidth memory latency microprocessors MMIO MTRR Multicore processors Opteron STREAM benchmark synchronization TLB Virtual Memory Xeon Phi

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2025