• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
UT Shield
The University of Texas at Austin
  • Home
  • Schedule
    • Current Semester
    • Past Semesters
  • Information

Past Semesters

September 24, 2024, Filed Under: 2024 Fall Semester, Current Semester

[Series 08] Enabling Efficient Memory Systems using Novel Compression Methods

Title: Enabling Efficient Memory Systems using Novel Compression Methods

Speaker: Per Stenström

Chalmers University of Technology / ZeroPoint Technologies
Goteborg, Sweden

Date: Nov 7th, 2024 at 3:30 pm

Location: EER 3.646 or Zoom Link

Abstract:

Using data compression methods in the memory hierarchy can improve the
efficiency of memory systems by enabling higher effective cache capacity,
more effective use of available memory bandwidth and by enabling higher
effective main memory capacity. This can lead to substantially higher
performance and lower power consumption. However, to enable these values
requires highly effective compression algorithms that can be implemented
with low latency and high throughput. Research at Chalmers University of
Technology and at ZeroPoint Technologies, a fabless startup company, has
yielded many new families of compression methods that are now being
commercially deployed. This talk will present the major insights of more
than a decade of research on memory compression methods for the memory
hierarchy. The talk covers value-aware caches and statistical compression
of cache content, compression algorithms that are tuned to the data at hand
through data analysis using new clustering algorithms to allow for
substantially higher memory bandwidth and compression infrastructures
that expand capacity of main memory.

Bio:

Per Stenstrom is professor at Chalmers University of Technology. His research
interests are in parallel computer architecture. He has authored or
co-authored four textbooks, about 200 publications and twenty patents in
this area. He has been program chairman of several top-tier IEEE and ACM
conferences including IEEE/ACM Symposium on Computer Architecture and acts
as Associate Editor of ACM TACO and Topical Editor IEEE Transaction on
Computers. He is a Fellow of the ACM and the IEEE and a member of Academia
Europaea and the Royal Swedish Academy of Engineering Sciences.

September 10, 2024, Filed Under: 2024 Fall Semester, Current Semester

[Series 02] Leveraging the IRON AI Engine API to program the Ryzen™ AI NPU

Title: Leveraging the IRON AI Engine API to program the Ryzen™ AI NPU

Speaker: Kristof Denolf & Joseph Melber

Date: September 17, 2024 at 3:30 pm

Location: EER 3.646 or Zoom Link

Abstract: Specialized hardware accelerators are abundantly available today including NPUs found in consumer laptops with AMD Ryzen™ AI CPUs. The NPU of AMD Ryzen™ AI devices includes an AI Engine array comprised of a set of VLIW vector processors, data movement accelerators (DMAs) and adaptable interconnect. By providing convenient software tool flows to program these devices, enthusiasts are enabled to productively harness the full capabilities of these powerful NPUs. IRON is a close-to-metal open-source toolkit enabling performance engineers to build fast and efficient, often specialized, designs through a set of Python language bindings around the mlir-aie dialect. The presentation will provide insights into the AI Engine compute and data movement capabilities supported in our tool flow. The speakers will demonstrate performance optimizations of increasingly complex designs by leveraging the unique architectural features of AI Engines.

Bio:

Kristof Denolf is a Fellow in AMD’s Research and Advanced Development group where he is working on energy-efficient computer vision and video processing applications to shape future AMD devices. He earned an M.Eng. in electronics from the Katholieke Hogeschool Brugge-Oostende (1998), now part of KULeuven, an M.Sc. in electronic system design from Leeds Beckett University (2000), and a Ph.D. from the Technical University Eindhoven (2007). He has over 25 years of combined research and industry experience at IMEC, Philips, Barco, Apple, Xilinx, and AMD. His main research interests are all aspects of the cost-efficient and dataflow-oriented design of video, vision, and graphics systems.

Joseph Melber is a Senior Member of Technical Staff in AMD’s Research and Advanced Development group. At AMD, he is working on hardware architectures and compiler technologies for current and future AMD devices. He received a BS in electrical engineering from the University Buffalo, as well as MS and PhD degrees from the electrical and computer engineering department at Carnegie Mellon University. His research interests include runtime systems, compiler abstractions for data movement, and hardware prototypes for future adaptive heterogeneous computing architectures.

September 3, 2024, Filed Under: 2024 Fall Semester, Current Semester

[Series 01a] Experimentally Understanding and Efficiently Mitigating DRAM Read Disturbance

Title: Experimentally Understanding and Efficiently Mitigating DRAM Read Disturbance

Speaker: Ataberk Olgun

Date: September 10, 2024 at 3:30 pm

Location: EER 3.650 or Zoom Link

Talk abstract: DRAM chips are increasingly more vulnerable to read disturbance phenomena (e.g., RowHammer and RowPress), where repeatedly accessing DRAM rows causes bitflips in nearby rows due to DRAM density scaling. Even though many prior works develop various RowHammer solutions, these solutions incur non-negligible and increasingly higher system performance, energy, and hardware area overheads as RowHammer vulnerability worsens.

In this talk, we will present our recent works on 1) understanding DRAM read disturbance in modern high bandwidth memory (HBM) chips, along with the open source infrastructure that enables experimental studies on state-of-the-art DRAM chips, and 2) performance-, energy-, and area-efficient system-level solutions to read disturbance. First, we describe the results of a detailed experimental analysis of read disturbance in six real HBM2 chips. We show that (1) the read disturbance vulnerability significantly varies between different HBM2 chips and between different components (e.g., 3D-stacked channels) inside a chip, (2) DRAM rows at the end and in the middle of a bank are more resilient to read disturbance, (3) fewer additional activations are sufficient to induce more read disturbance bitflips in a DRAM row if the row exhibits the first bitflip at a relatively high activation count, and (4) a modern HBM2 chip implements undocumented read disturbance defenses that track potential aggressor rows based on how many times they are activated. We also briefly describe the infrastructure that enabled the discoveries we made in our study on read disturbance in high bandwidth memory chips along with those made in multiple recent works that investigate read disturbance in real DRAM chips (e.g., RowPress).

Second, we introduce ABACuS, a new low-cost hardware-counter-based RowHammer mitigation technique that performance-, energy-, and area-efficiently scales with worsening RowHammer vulnerability. ABACuS’s key idea is to use a single shared row activation counter to track activations to the rows with the same row address in all DRAM banks. Unlike state-of-the-art RowHammer mitigation mechanisms that implement a separate row activation counter for each DRAM bank, ABACuS implements fewer counters (e.g., only one) to track an equal number of aggressor rows. At very low RowHammer thresholds (where only 125 activations cause a bitflip), ABACuS induces small system performance and DRAM energy overhead, and outperforms and takes up smaller chip area than the state-of-the-art mitigation techniques (Hydra and Graphene).

All data, sources, and paper PDFs for the described works are freely and openly available.
– HBM Read Disturbance: https://github.com/CMU-SAFARI/HBM-Read-Disturbance, Paper PDF: https://arxiv.org/pdf/2310.14665
– DRAM Bender: https://github.com/CMU-SAFARI/DRAM-Bender, Paper PDF: https://arxiv.org/pdf/2211.05838
– ABACuS sources: https://github.com/CMU-SAFARI/ABACuS, Paper PDF: https://arxiv.org/pdf/2310.09977

Bio: Ataberk Olgun is a 3rd year PhD student at ETH Zurich. His broad research interests include designing secure, high-performance, and energy-efficient DRAM architectures. Especially with worsening RowHammer vulnerability, it is increasingly difficult to design new DRAM architectures that satisfy all three characteristics. His current research focuses on i) deeply understanding and ii) efficiently mitigating the RowHammer vulnerability in modern systems.

September 3, 2024, Filed Under: 2024 Fall Semester, Current Semester

[Series 01b] Enabling the Adoption of Data-Centric Systems: Hardware/Software Support for Processing-Using-Memory Architectures

Title: Enabling the Adoption of Data-Centric Systems: Hardware/Software Support for Processing-Using-Memory Architectures

Speaker: Geraldo F. Oliveira

Date: September 11, 2024 at 2:00 pm

Location: EER 0.806/808 or Zoom Link

Talk abstract: The increasing prevalence and growing size of data in modern applications have led to high performance and energy costs for computation in traditional processor-centric computing systems. To mitigate these costs, the processing-in-memory (PIM) paradigm moves computation closer to where the data resides, reducing (and sometimes eliminating) the need to move data between memory and the processor. There are two main approaches to PIM: (1) processing-near-memory (PNM), where PIM logic is added to the same die as memory or to the logic layer of 3D-stacked memory, and (2) processing-using-memory (PUM), which uses the operational principles of memory cells to perform computation.  Due to a push from the application domain and recent developments in memory manufacturing and packaging, memory manufacturers (and startups) have finally introduced the first real-world PNM architectures into the market. However, fully adopting PUM in today’s systems is still very challenging due to the lack of tools and system support for such architectures across the computer architecture stack, which includes (i) frameworks that can facilitate the implementation of complex operations and algorithms using the underlying PUM primitives;  (ii) execution models that can take advantage of the available application parallelism to maximize hardware utilization and throughput; (iii) compiler support and compiler optimizations targeting PUM architectures;  (iv) operating system support for PUM-aware virtual memory and memory management.

In this talk, we will discuss our major recent research results on different tools and system support for PUM architectures (with a focus on DRAM-based solutions), which aim to ease the adoption of such architectures in current and future systems.  Our work builds on prior works ([1, 2]) that show that current DRAM chips can be modified slightly to execute simple data movement and Boolean operations, unleashing the PUM capabilities of current memory technologies.   Based on that, we will first describe our efforts to extend the capabilities of PUM solutions further to enable their applicability to various workloads. To do so, we implement complex PUM operations using (i) SIMDRAM [3], an end-to-end framework that composes PUM primitives to implement complex arithmetic operations entirely within DRAM in a single-instruction multiple-data (SIMD) manner;  and (ii) pLUTo [4], a PUM architecture that leverages the high storage density of DRAM to enable the massively parallel storing and querying of lookup tables (LUTs) instead of relying on complex extra in-DRAM logic. Second, we propose system solutions that expose the newly added PUM capabilities to the application stack, focusing on programmer-friendly approaches. Concretely, we will discuss MIMDRAM [5], a hardware/software co-designed PUM system that introduces the ability to allocate and control only the required amount of computing resources inside the DRAM subarray for PUM computation. MIMDRAM implements compiler passes and system support to guarantee high utilization of the PUM substrate. Third, we extensively analyze current commodity off-the-shelf (COTS) DRAM chips to characterize their capability to perform PUM operations with modifications only to the DRAM controller and not to the DRAM chip or interface [6]. We demonstrate that (i) PUM architectures are a promising solution, leading to significant (e.g., more than an order of magnitude) performance and energy gains compared to processor-centric systems for various real-world applications, and (2) COTS DRAM chips are capable of performing a range of  PUM operations with high success rates.

[1] V. Seshadri, Y. Kim et al., “RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization,” in MICRO, 2013.
[2] V. Seshadri, D. Lee et al., “Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology,” in MICRO, 2017.
[3] N. Hajinazar, G. F. Oliveira et al., “SIMDRAM: A Framework for Bit-Serial SIMD
Processing Using DRAM,” in ASPLOS, 2021
[4] J. D. Ferreira, G. Falcao et al., “pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables,” in MICRO, 2022.
[5] G. F. Oliveira, A. Olgun et al., “MIMDRAM: An End-to-End Processing-UsingDRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing,” in HPCA, 2024.
[6] I. E. Yuksel, Y. C. Tugrul  et al., “Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis,” in HPCA, 2024.

Bio: Geraldo F. Oliveira (https://geraldofojunior.github.io/) is a Ph.D. candidate in the Safari Research Group at ETH Zürich, working with Prof. Onur Mutlu. His current broader research interests are in computer architecture and systems, focusing on memory-centric architectures for high-performance and energy-efficient systems. In particular, his Ph.D. research focuses on taking advantage of new memory technologies to accelerate distinct classes of applications and provide system support for novel memory-centric systems. Geraldo has published several works on this topic in major conferences and journals such as HPCA, ASPLOS, ISCA, MICRO, and IEEE Micro.

April 2, 2024, Filed Under: Past Semesters

2024 Spring

Sponsored by:

  • How does one bit-flip corrupt an entire deep neural network, and what to do about it
    Speaker: Yanjing Li
    Date: April 16, 2024 at 3:30pm
    Location: EER 3.646

September 16, 2022, Filed Under: 2022 Fall Semester, Past Semesters

2022 Fall

Sponsored by:

  • HBM3 RAS: The Journey to Enhancing Die-Stacked DRAM Resilience at Scale
    Speaker: Sudhanva Gurumurthi (AMD)
    Date: November 8, 2022 at 3:30pm
    Location: EER 3.646
  • Accelerating the Pace of AWS Inferentia Chip Development, From Concept to End Customer Use
    Speaker: Randy Huang, Amazon AWS
    Date: October 18, 2022 at 3:30pm
    Location: EER 3.646
  • Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture
    Speaker: Juan Gómez Luna, ETH Zurich
    Date: September 20, 2022 at 3:30pm
    Location: EER 3.646 or Zoom

January 16, 2020, Filed Under: 2020 Spring Semester, Past Semesters

2020 Spring

DateSpeakerTitleDetailsSign up
February 4, 2020 Andreas Herkersdorf, TU Munich Tackling the MPSoC Data Locality Challenge with Regional Coherence and Near Memory Acceleration Andreas Herkersdorf, TU Munich
January 28, 2020 Paul V Gratz, Texas A&M SPIN, SWAP and DRAIN: A New Approach to Address Deadlocks in Interconnection Networks Paul V Gratz, Texas A&M
February 25, 2020 Rob Schreiber, Cerebras SysMeeting the Systems Challenge of Deep Learning Rob Schreiber, Cerebras Systems

September 16, 2019, Filed Under: 2019 Fall Semester, Past Semesters

2019 Fall

Sponsored by:

September 10, 2019 Jishen Zhao, UCSD Unlocking the Full Potential of Persistent Memory Technique with Software/Hardware Coordinated Design Jishen Zhao 
September 17, 2019 Tushar Krishna, Georgia Tech Enabling Continuous Learning through Synaptic Plasticity in Hardware Tushar Krishna 
September 17, 2019 Arrvindh Shriraman, Simon Fraser Univ µIRs-Intermediate Representation for Agile Design of Accelerators Arrvindh Shriraman 
September 17, 2019 Yuhao Zhu, Univ of Rochester The Next Quintillion Pixels: Architecting Next-Generation Mobile Visual Computing Systems.  Yuhao Zhu 
September 24, 2019 Sandhya Dwarkadas, Univ of Rochester Toward Efficient and Protected Address Translation in Memory Management Sandhya Dwarkadas 
October 18, 2019 Minsoo Rhu, KAIST TensorDIMM: A Practical Near-Memory Processing Architecture for Sparse Embedding Layers in Deep Learning Minsoo Rhu, KAIST 
October 29, 2019 Marc Snir, UIUC LCI: a lightweight communication interface for HPC Marc Snir 
November 12, 2019 Glenn Henry, Centaur Technology Centaur Technology’s Deep-Learning Coprocessor Technology Glenn Henry 
November 5, 2019 Vijay Janapa Reddi, Harvard The Vision Behind MLPerf (mlperf.org): A Community-driven Benchmark Suite for ML Frameworks, ML Accelerators and ML Systems in Cloud and Edge Computing Vijay Janapa Reddi 

January 16, 2019, Filed Under: 2019 Spring Semester, Past Semesters

2019 Spring

Sponsored by:

January 22, 2019 Bruce Jacob, UMD All Tomorrow’s Memory Systems Bruce Jacob, UMD 
January 29, 2019 Onur Mutlu, ETH Zurich Memory Systems and Memory-Centric Computing Systems: Challenges and Opportunities Onur Mutlu, ETH Zurich 
February 26, 2019 Doug Carmean, Microsoft Architecting the Future with Humility from the Past Doug Carmean, Microsoft 

September 16, 2018, Filed Under: Past Semesters

2018 Fall

Sponsored by:

November 9, 2018 Xiaosong Ma, Qatar Computing Research Institute Redesigning System Software for Shared Hardware Xiaosong Ma, Qatar Computing Research Institute 
October 2, 2018 Ishwar Bhati, Intel Architectural Tradeoffs in Designing NVM based LLC Ishwar Bhati, Intel 
October 16, 2018 David Nellans, NVIDIA Navigating the Challenges of Industrial Performance Simulation David Nellans, NVIDIA 
October 30, 2018 Milad Hashemi Redefining Interfaces: ML to Accelerate and Design Computer Systems Milad Hashemi, Google 
November 8, 2018 Liu Xu, William and Mary Squeezing Software Performance via Eliminating Wasteful Memory Operations Liu Xu, William and Mary 
November 13, 2018 Mihai Christodorescu, Visa Research Navigating the Challenges of Industrial Performance Simulation Mihai Christodorescu 
December 14, 2018 Nader Bagherzadeh, UC Irvine Deep Learning Opportunities and Limitations Nader Bagherzadeh, UC Irvine 
  • « Go to Previous Page
  • Page 1
  • Page 2
  • Page 3
  • Go to Next Page »

Primary Sidebar

Current Semester

[Series 03] Advanced Fabrication Techniques, an Architects Perspective

Past Semesters

Welcome to CompArch 2024 Fall

2024 Spring

2022 Fall

2020 Spring

2019 Spring

2019 Fall

2018 Fall

2017 Spring

2017 Fall

2016 Spring

Prior Semesters

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2026