A Quick Note There are a lot of topics that could be addressed here, but this short note will focus on bandwidth from main memory (using the STREAM benchmark) as a function of the number of threads used. Published STREAM Bandwidth Results Official STREAM submission at: http://www.cs.virginia.edu/stream/stream_mail/2013/0015.html Compiled with icc… read more
accelerated computing
Coherence with Cached Memory-Mapped IO
In response to my previous blog entry, a question was asked about how to manage coherence for cached memory-mapped IO regions. Here are some more details… Maintaining Coherence with Cached Memory-Mapped IO For the “read-only” range, cached copies of MMIO lines will never be invalidated by external traffic, so repeated… read more
Notes on Cached Access to Memory-Mapped IO Regions
When attempting to build heterogeneous computers with “accelerators” or “coprocessors” on PCIe interfaces, one quickly runs into asymmetries between the data transfer capabilities of processors and IO devices. These asymmetries are often surprising — the tremendously complex processor is actually less capable of generating precisely controlled high-performance IO transactions than… read more