Efficient Compute-in-Memory Architectures for AI Workloads

The field of compute-in-memory (CIM) architectures is rapidly advancing, with a focus on improving energy efficiency and performance for artificial intelligence (AI) workloads. Recent developments have led to the creation of innovative CIM macros that can efficiently perform complex operations such as matrix multiplication and dot product computation. These architectures are designed to minimize data movement and maximize computational efficiency, making them suitable for edge-side intelligent applications. Noteworthy papers in this area include: A digital SRAM-based compute-in-memory macro that achieves 34.1 TOPS/W energy efficiency and 120.77 GOPS/mm2 area efficiency, outperforming CPU and GPU implementations. FERMI-ML, a flexible and resource-efficient memory-in-situ SRAM macro that supports variable-precision MAC and CAM operations, achieving 364 TOPS/W energy efficiency and 1.93 TOPS throughput. NL-DPE, a non-linear dot product engine that overcomes the limitations of traditional CIM accelerators, delivering 28X energy efficiency and 249X speedup over a GPU baseline.

Sources

A digital SRAM-based compute-in-memory macro for weight-stationary dynamic matrix multiplication in Transformer attention score computation

FERMI-ML: A Flexible and Resource-Efficient Memory-In-Situ SRAM Macro for TinyML acceleration

Segmented Exponent Alignment and Dynamic Wordline Activation for Floating-Point Analog CIM Macros

NL-DPE: An Analog In-memory Non-Linear Dot Product Engine for Efficient CNN and LLM Inference

A Bit Level Weight Reordering Strategy Based on Column Similarity to Explore Weight Sparsity in RRAM-based NN Accelerator

DARE: An Irregularity-Tolerant Matrix Processing Unit with a Densifying ISA and Filtered Runahead Execution

CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures

Built with on top of