Advancements in Efficient Computing for AI and Edge Devices

The field of efficient computing for AI and edge devices is rapidly evolving, with a focus on optimizing performance, reducing latency, and improving energy efficiency. Recent developments have centered around innovative architectures, scheduling techniques, and compilation frameworks that enable faster and more accurate processing of complex workloads. Notably, researchers have explored the use of sparse and operator-aware hybrid scheduling, on-demand multi-task sparsity, and associative memory-based architectures to accelerate deep neural network inference. Additionally, there has been a push towards developing compiler tools and performance profiling techniques to better understand and optimize the behavior of accelerator compilers. Overall, these advancements have the potential to significantly improve the efficiency and effectiveness of AI and edge computing applications.

Noteworthy papers include: AutoSAGE, which presents an input-aware CUDA scheduler for sparse GNN aggregation; CAMformer, which proposes a novel accelerator that reinterprets attention as an associative memory operation; and IntAttention, which introduces a fully integer attention pipeline for efficient edge inference.

Sources

AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention

Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

Using MLIR Transform to Design Sliced Convolution Algorithm

VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking

HeLEx: A Heterogeneous Layout Explorer for Spatial Elastic Coarse-Grained Reconfigurable Arrays

SparOA: Sparse and Operator-aware Hybrid Scheduling for Edge DNN Inference

CAMformer: Associative Memory is All You Need

Understanding Accelerator Compilers via Performance Profiling

On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices

SARA: A Stall-Aware Memory Allocation Strategy for Mixed-Criticality Systems

Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection?

RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI

Bombyx: OpenCilk Compilation for FPGA Hardware Acceleration

IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference

Modeling and Optimizing Performance Bottlenecks for Neuromorphic Accelerators