The field of computer architecture and artificial intelligence is witnessing significant advancements in compute-near-memory systems and efficient inference techniques. Researchers are exploring innovative approaches to optimize compute-near-memory systems, such as cost modeling frameworks and modular frameworks for efficient GPU usage. Additionally, there is a growing focus on developing mixed-precision inference techniques for large language models and neural processing engines. These advancements aim to improve performance, reduce memory bandwidth requirements, and increase energy efficiency. Noteworthy papers in this area include CoMoNM, which presents a generic cost modeling framework for compute-near-memory systems, and XR-NPE, which proposes a high-throughput mixed-precision SIMD neural processing engine. Other notable works include Visual Perception Engine, MemorySim, and TurboMind, which demonstrate significant improvements in efficient inference techniques and compute-near-memory systems.
Advancements in Compute-Near-Memory Systems and Efficient Inference Techniques
Sources
XR-NPE: High-Throughput Mixed-precision SIMD Neural Processing Engine for Extended Reality Perception Workloads
WeedSense: Multi-Task Learning for Weed Segmentation, Height Estimation, and Growth Stage Classification