The fields of compute-in-memory (CIM) architectures, continual learning, and computer architecture are experiencing significant advancements, driven by the need for improved energy efficiency, performance, and adaptability in artificial intelligence (AI) workloads. A common theme among these areas is the focus on minimizing data movement, maximizing computational efficiency, and preserving previously learned knowledge.
In the realm of CIM architectures, innovative macros have been developed to efficiently perform complex operations such as matrix multiplication and dot product computation. Notable examples include a digital SRAM-based compute-in-memory macro achieving 34.1 TOPS/W energy efficiency and 120.77 GOPS/mm2 area efficiency, outperforming CPU and GPU implementations. FERMI-ML, a flexible and resource-efficient memory-in-situ SRAM macro, supports variable-precision MAC and CAM operations, achieving 364 TOPS/W energy efficiency and 1.93 TOPS throughput. NL-DPE, a non-linear dot product engine, overcomes the limitations of traditional CIM accelerators, delivering 28X energy efficiency and 249X speedup over a GPU baseline.
Continual learning is moving towards developing more effective methods for preserving knowledge and adapting to new tasks in vision-language models. Recent research has focused on addressing the challenges of class-incremental learning, domain incremental learning, and lifelong learning, with a particular emphasis on leveraging pre-trained models and multi-modal supervision. Notable advancements include the use of analytic contrastive projection, hierarchical semantic tree anchoring, and language-based anchors to mitigate catastrophic forgetting and improve performance. Papers such as AnaCP, LAVA, HASTEN, BOFA, and DMC have introduced novel methods and frameworks to enable incremental feature adaptation, preserve relative visual geometry, and reduce catastrophic forgetting.
The field of computer architecture is witnessing a significant shift towards chiplet-based systems and processing-in-memory (PIM) architectures, aiming to address the memory bandwidth wall and improve the performance of memory-intensive workloads. Researchers are exploring various techniques to enable the construction of larger-scale VLSI systems with higher energy efficiency in data movement. Notably, the use of 2.5D/3D heterogeneous integration and the development of chiplet-based memory modules are gaining traction. Papers such as Sangam and DCC have presented chiplet-based memory modules and data-centric ML compilers for PIM systems, achieving significant speedup and energy savings for large language model inference.
Overall, these advancements have the potential to significantly improve the performance and efficiency of various workloads, including large language models and vision transformers. As research in these areas continues to evolve, we can expect to see more innovative solutions that address the challenges of energy efficiency, adaptability, and performance in AI workloads.