• Skip to main content
  • Skip to primary sidebar
UT Shield
The University of Texas at Austin

Performance

September 17, 2018, Filed Under: Performance, Performance Counters, Reference

Using hardware performance counters to determine how often both logical processors are active on an Intel CPU

Most Intel microprocessors support “HyperThreading” (Intel’s trademark for their implementation of “simultaneous multithreading”) — which allows the hardware to support (typically) two “Logical Processors” for each physical core. Processes running on the two Logical Processors share most of the processor resources (particularly caches and execution units). Some workloads (particularly heterogeneous… read more 

August 1, 2018, Filed Under: Computer Architecture, Computer Hardware, Performance

Why I hate MPI (from a performance analysis perspective)

According to Dr. Bandwidth, performance analysis has two recurring themes: How fast should this code (or “simple” variations on this code) run on this hardware? If I am analyzing (apparent) performance shortfalls, how can I distinguish between cause and effect? For very simple codes, it may be possible to do… read more 

July 23, 2018, Filed Under: Computer Architecture, Performance, Performance Counters

Comments on timing short code sections on Intel processors

(From a recent post of mine on the Intel software developer forums — some potentially useful words to go along with my new low-overhead-timers project…) Updates on 2019-01-23 in blue. There are lots of topics that you need to be aware of when attempting fine-grain timing.  A few of the… read more 

  • « Go to Previous Page
  • Page 1
  • Page 2
  • Page 3
  • Page 4
  • Page 5
  • Interim pages omitted …
  • Page 7
  • Go to Next Page »

Primary Sidebar

Recent Posts

  • Single-core memory bandwidth: Latency, Bandwidth, and Concurrency
  • Dr. Bandwidth is moving on…
  • The evolution of single-core bandwidth in multicore systems — update
  • “Memory directories” in Intel processors
  • The evolution of single-core bandwidth in multicore processors

Tags

accelerated computing arithmetic cache communication configuration coprocessor Distributed cache DRAM Hash functions high performance computing Knights Landing memory bandwidth memory latency microprocessors MMIO MTRR Multicore processors Opteron STREAM benchmark synchronization TLB Virtual Memory Xeon Phi

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2025