The field of compute-in-memory (CIM) architectures is rapidly advancing, with a focus on improving energy efficiency and performance for artificial intelligence (AI) workloads. Recent developments have led to the creation of innovative CIM macros that can efficiently perform complex operations such as matrix multiplication and dot product computation. These architectures are designed to minimize data movement and maximize computational efficiency, making them suitable for edge-side intelligent applications. Noteworthy papers in this area include: A digital SRAM-based compute-in-memory macro that achieves 34.1 TOPS/W energy efficiency and 120.77 GOPS/mm2 area efficiency, outperforming CPU and GPU implementations. FERMI-ML, a flexible and resource-efficient memory-in-situ SRAM macro that supports variable-precision MAC and CAM operations, achieving 364 TOPS/W energy efficiency and 1.93 TOPS throughput. NL-DPE, a non-linear dot product engine that overcomes the limitations of traditional CIM accelerators, delivering 28X energy efficiency and 249X speedup over a GPU baseline.
Efficient Compute-in-Memory Architectures for AI Workloads
Sources
A digital SRAM-based compute-in-memory macro for weight-stationary dynamic matrix multiplication in Transformer attention score computation
A Bit Level Weight Reordering Strategy Based on Column Similarity to Explore Weight Sparsity in RRAM-based NN Accelerator