Advancements in Chiplet-Based Systems and Processing-in-Memory Architectures

The field of computer architecture is witnessing a significant shift towards chiplet-based systems and processing-in-memory (PIM) architectures. These innovative approaches aim to address the memory bandwidth wall and improve the performance of memory-intensive workloads. Researchers are exploring various techniques to enable the construction of larger-scale VLSI systems with higher energy efficiency in data movement. Notably, the use of 2.5D/3D heterogeneous integration and the development of chiplet-based memory modules are gaining traction. Furthermore, PIM architectures are being investigated to reduce data movement and improve performance. The integration of advanced processing components, such as systolic arrays and SRAM-based buffers, into PIM architectures is also being explored. Additionally, researchers are working on improving the utility of CPU pins to alleviate memory bandwidth constraints. Overall, these advancements have the potential to significantly improve the performance and efficiency of various workloads, including large language models and vision transformers. Noteworthy papers in this area include Sangam, which presents a chiplet-based memory module that achieves significant speedup and energy savings for large language model inference, and DCC, a data-centric ML compiler for PIM systems that jointly co-optimizes data rearrangements and compute code, achieving up to 7.68x speedup on HBM-PIM and up to 13.17x speedup on AttAcc PIM backend over GPU-only execution.

Sources

Tiny Chiplets Enabled by Packaging Scaling: Opportunities in ESD Protection and Signal Integrity

Sangam: Chiplet-Based DRAM-PIM Accelerator with CXL Integration for LLM Inferencing

Pushing the Memory Bandwidth Wall with CXL-enabled Idle I/O Bandwidth Harvesting

Dissecting and Re-architecting 3D NAND Flash PIM Arrays for Efficient Single-Batch Token Generation in LLMs

Pico-Cloud: Cloud Infrastructure for Tiny Edge Devices

Inside VOLT: Designing an Open-Source GPU Compiler

PIM or CXL-PIM? Understanding Architectural Trade-offs Through Large-Scale Benchmarking

CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations

Hemlet: A Heterogeneous Compute-in-Memory Chiplet Architecture for Vision Transformers with Group-Level Parallelism

A Tensor Compiler for Processing-In-Memory Architectures

Toward Open-Source Chiplets for HPC and AI: Occamy and Beyond

BlueScript: A Disaggregated Virtual Machine for Microcontrollers

Can Asymmetric Tile Buffering Be Beneficial?

Built with on top of