• Skip to main content
  • Skip to primary sidebar
UT Shield
The University of Texas at Austin

Computer Hardware

July 14, 2013, Filed Under: Computer Hardware, Performance, Performance Counters

Notes on the mystery of hardware cache performance counters

In response to a question on the PAPI mailing list, I scribbled some notes to try to help users understand the complexity of hardware performance counters for cache accesses and cache misses, and thought they might be helpful here…. For any interpretation of specific hardware performance counter events, it is… read more 

May 30, 2013, Filed Under: Accelerated Computing, Computer Hardware, Linux

Coherence with Cached Memory-Mapped IO

In response to my previous blog entry, a question was asked about how to manage coherence for cached memory-mapped IO regions.   Here are some more details… Maintaining Coherence with Cached Memory-Mapped IO For the “read-only” range, cached copies of MMIO lines will never be invalidated by external traffic, so repeated… read more 

May 29, 2013, Filed Under: Accelerated Computing, Computer Hardware, Linux

Notes on Cached Access to Memory-Mapped IO Regions

When attempting to build heterogeneous computers with “accelerators” or “coprocessors” on PCIe interfaces, one quickly runs into asymmetries between the data transfer capabilities of processors and IO devices.  These asymmetries are often surprising — the tremendously complex processor is actually less capable of generating precisely controlled high-performance IO transactions than… read more 

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 6
  • Page 7
  • Page 8
  • Page 9
  • Page 10
  • Interim pages omitted …
  • Page 13
  • Go to Next Page »

Primary Sidebar

Recent Posts

  • Single-core memory bandwidth: Latency, Bandwidth, and Concurrency
  • Dr. Bandwidth is moving on…
  • The evolution of single-core bandwidth in multicore systems — update
  • “Memory directories” in Intel processors
  • The evolution of single-core bandwidth in multicore processors

Tags

accelerated computing arithmetic cache communication configuration coprocessor Distributed cache DRAM Hash functions high performance computing Knights Landing memory bandwidth memory latency microprocessors MMIO MTRR Multicore processors Opteron STREAM benchmark synchronization TLB Virtual Memory Xeon Phi

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2025