microprocessors

August 28, 2023, Filed Under: Cache Coherence Implementations, Cache Coherence Protocols, Computer Architecture

“Memory directories” in Intel processors

One of the (many) minimally documented features of recent Intel processor implementations is the “memory directory”. This is used in multi-socket systems to reduce cache coherence traffic between sockets. I have referred to this in various presentations as: “A Memory Directory is one or more bits per cache line… read more

September 10, 2021, Filed Under: Cache Coherence Implementations, Computer Architecture

Mapping addresses to L3/CHA slices in Intel processors

Starting with the Xeon E5 processors “Sandy Bridge EP” in 2012, all of Intel’s mainstream multicore server processors have included a distributed L3 cache with distributed coherence processing. The L3 cache is divided into “slices”, which are distributed around the chip — typically one “slice” for each processor core. Each… read more

January 22, 2018, Filed Under: Computer Hardware, Performance, Performance Counters

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing)

A peculiar throughput limitation on Intel’s Xeon Phi x200 (Knights Landing) Introduction: In December 2017, my colleague Damon McDougall (now at AMD) asked for help in porting the fused multiply-add example code from a Colfax report (https://colfaxresearch.com/skl-avx512/) to the Xeon Phi x200 (Knights Landing) processors here at TACC. There was… read more