John McCalpin's blog

Dr. Bandwidth explains all….

New Year’s Updates

Posted by John D. McCalpin, Ph.D. on January 9, 2019

As part of my attempt to become organized in 2019, I found several draft blog entries that had never been completed and made public.

This week I updated three of those posts — two really old ones (primarily of interest to computer architecture historians), and one from 2018:

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Facebook
  • LinkedIn

One Response to “New Year’s Updates”

  1.   anon Says:

    Short comment re: http://sites.utexas.edu/jdm4372/2018/07/23/comments-on-timing-short-code-sections-on-intel-processors/

    On most intel, the branch predictor remembers a distribution over short (length ~30) subsequences. This is often enough to reconstruct much longer sequences of branches, similar to how short reads can be used to assemble a long genome. A typical intel processor can almost perfectly predict 1:1 random periodic branching patterns of period 2000; and having a benchmarking loop induces a periodic pattern. See https://discourse.julialang.org/t/psa-microbenchmarks-remember-branch-history/17436 for a discussion in the julialang forums.

    Depending on context, missing a branch can be much more expensive than expected: The missed branch can lead speculative execution into a rabbit hole that eats memory bandwidth, replaces good cache entries with garbage, and misses the opportunity to fetch the correct lines. If the speculative execution window is especially long (the missing branch is waiting on memory in order to resolve), then this gets worse.

    Sorry for replying here. The comment section of your relevant post was already closed (feel free to move this reply there).

Leave a Reply



XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>