• Skip to main content
  • Skip to primary sidebar
UT Shield
The University of Texas at Austin

Computer Hardware

May 27, 2021, Filed Under: Computer Architecture, Computer Hardware, Performance Counters

Die Locations of Cores and L3 Slices for Intel Xeon Processors

Intel provides nice schematic diagrams of the layouts of their processor chips, but provides no guidance on how the user-visible core numbers and L3 slice numbers map to the locations on the die. Most of the time there is no “need” to know the locations of the units, but there… read more 

February 18, 2019, Filed Under: Cache Coherence Implementations, Cache Coherence Protocols, Computer Architecture

Intel’s future “CLDEMOTE” instruction

I recently saw a reference to a future Intel “Atom” core called “Tremont” and ran across an interesting new instruction, “CLDEMOTE”, that will be supported in “Future Tremont and later” microarchitectures (ref: “Intel® Architecture Instruction Set Extensions and Future Features Programming Reference”, document 319433-035, October 2018). The “CLDEMOTE” instruction is… read more 

January 7, 2019, Filed Under: Cache Coherence Implementations, Cache Coherence Protocols, Computer Architecture, Linux, Performance Counters

SC18 paper: HPL and DGEMM performance variability on Intel Xeon Platinum 8160 processors

Here are the annotated slides from my SC18 presentation on Snoop Filter Conflicts that cause performance variability in HPL and DGEMM on the Xeon Platinum 8160 processor. This slide presentation includes data (not included in the paper) showing that Snoop Filter Conflicts occur in all Intel Scalable Processors (a.k.a., “Skylake… read more 

  • « Go to Previous Page
  • Page 1
  • Page 2
  • Page 3
  • Page 4
  • Page 5
  • Interim pages omitted …
  • Page 13
  • Go to Next Page »

Primary Sidebar

Recent Posts

  • Single-core memory bandwidth: Latency, Bandwidth, and Concurrency
  • Dr. Bandwidth is moving on…
  • The evolution of single-core bandwidth in multicore systems — update
  • “Memory directories” in Intel processors
  • The evolution of single-core bandwidth in multicore processors

Tags

accelerated computing arithmetic cache communication configuration coprocessor Distributed cache DRAM Hash functions high performance computing Knights Landing memory bandwidth memory latency microprocessors MMIO MTRR Multicore processors Opteron STREAM benchmark synchronization TLB Virtual Memory Xeon Phi

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2025