• Skip to main content
  • Skip to primary sidebar
UT Shield
The University of Texas at Austin

cache

January 7, 2019, Filed Under: Cache Coherence Implementations, Cache Coherence Protocols, Computer Architecture, Linux, Performance Counters

SC18 paper: HPL and DGEMM performance variability on Intel Xeon Platinum 8160 processors

Here are the annotated slides from my SC18 presentation on Snoop Filter Conflicts that cause performance variability in HPL and DGEMM on the Xeon Platinum 8160 processor. This slide presentation includes data (not included in the paper) showing that Snoop Filter Conflicts occur in all Intel Scalable Processors (a.k.a., “Skylake… read more 

January 1, 2018, Filed Under: Cache Coherence Implementations, Cache Coherence Protocols, Computer Architecture, Computer Hardware, Reference

Notes on “non-temporal” (aka “streaming”) stores

Memory systems using caches have a lot more potential flexibility than most implementations are able to exploit – you get the standard behavior all the time, even if an alternative behavior would be allowable and desirable in a specific circumstance.  One area in which many vendors have provided an alternative… read more 

October 16, 2016, Filed Under: Computer Architecture, Computer Hardware, Performance

Invited Talk at SuperComputing 2016!

“Memory Bandwidth and System Balance in HPC Systems” If you are planning to attend the SuperComputing 2016 conference in Salt Lake City next month, be sure to reserve a spot on your calendar for my talk on Wednesday afternoon (4:15pm-5:00pm). I will be talking about the technology and market trends… read more 

  • « Go to Previous Page
  • Page 1
  • Page 2
  • Page 3
  • Go to Next Page »

Primary Sidebar

Recent Posts

  • Single-core memory bandwidth: Latency, Bandwidth, and Concurrency
  • Dr. Bandwidth is moving on…
  • The evolution of single-core bandwidth in multicore systems — update
  • “Memory directories” in Intel processors
  • The evolution of single-core bandwidth in multicore processors

Tags

accelerated computing arithmetic cache communication configuration coprocessor Distributed cache DRAM Hash functions high performance computing Knights Landing memory bandwidth memory latency microprocessors MMIO MTRR Multicore processors Opteron STREAM benchmark synchronization TLB Virtual Memory Xeon Phi

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2025