Advancements in Compute-Near-Memory Systems and Efficient Inference Techniques

The field of computer architecture and artificial intelligence is witnessing significant advancements in compute-near-memory systems and efficient inference techniques. Researchers are exploring innovative approaches to optimize compute-near-memory systems, such as cost modeling frameworks and modular frameworks for efficient GPU usage. Additionally, there is a growing focus on developing mixed-precision inference techniques for large language models and neural processing engines. These advancements aim to improve performance, reduce memory bandwidth requirements, and increase energy efficiency. Noteworthy papers in this area include CoMoNM, which presents a generic cost modeling framework for compute-near-memory systems, and XR-NPE, which proposes a high-throughput mixed-precision SIMD neural processing engine. Other notable works include Visual Perception Engine, MemorySim, and TurboMind, which demonstrate significant improvements in efficient inference techniques and compute-near-memory systems.

Sources

CoMoNM: A Cost Modeling Framework for Compute-Near-Memory Systems

Visual Perception Engine: Fast and Flexible Multi-Head Inference for Robotic Vision Tasks

MemorySim: An RTL-level, timing accurate simulator model for the Chisel ecosystem

XR-NPE: High-Throughput Mixed-precision SIMD Neural Processing Engine for Extended Reality Perception Workloads

Tight Inter-Core Cache Contention Analysis for WCET Estimation on Multicore Systems

WeedSense: Multi-Task Learning for Weed Segmentation, Height Estimation, and Growth Stage Classification

Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach

GasTwinFormer: A Hybrid Vision Transformer for Livestock Methane Emission Segmentation and Dietary Classification in Optical Gas Imaging

Efficient Mixed-Precision Large Language Model Inference with TurboMind

Built with on top of