• Skip to main content
  • Skip to primary sidebar
UT Shield
The University of Texas at Austin

high performance computing

August 1, 2018, Filed Under: Computer Architecture, Computer Hardware, Performance

Why I hate MPI (from a performance analysis perspective)

According to Dr. Bandwidth, performance analysis has two recurring themes: How fast should this code (or “simple” variations on this code) run on this hardware? If I am analyzing (apparent) performance shortfalls, how can I distinguish between cause and effect? For very simple codes, it may be possible to do… read more 

January 22, 2018, Filed Under: Computer Hardware, Performance, Performance Counters

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing) Introduction: In December 2017, my colleague Damon McDougall (now at AMD) asked for help in porting the fused multiply-add example code from a Colfax report (https://colfaxresearch.com/skl-avx512/) to the Xeon Phi x200 (Knights Landing) processors here at TACC.   There was… read more 

November 22, 2016, Filed Under: Computer Architecture, Computer Hardware

SC16 Invited Talk: Memory Bandwidth and System Balance in HPC Systems

I have been involved in HPC for over 30 years: 12 years as student & faculty user in ocean modeling, 12 years as a performance analyst and system architect at SGI, IBM, and AMD, and over 7 years as a research scientist at TACC. This history is based on my… read more 

  • Page 1
  • Page 2
  • Page 3
  • Page 4
  • Go to Next Page »

Primary Sidebar

Recent Posts

  • Single-core memory bandwidth: Latency, Bandwidth, and Concurrency
  • Dr. Bandwidth is moving on…
  • The evolution of single-core bandwidth in multicore systems — update
  • “Memory directories” in Intel processors
  • The evolution of single-core bandwidth in multicore processors

Tags

accelerated computing arithmetic cache communication configuration coprocessor Distributed cache DRAM Hash functions high performance computing Knights Landing memory bandwidth memory latency microprocessors MMIO MTRR Multicore processors Opteron STREAM benchmark synchronization TLB Virtual Memory Xeon Phi

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2025