Advancements in Edge AI and Heterogeneous Computing

The field of edge AI and heterogeneous computing is rapidly evolving, with a focus on improving performance, energy efficiency, and cost-effectiveness. Recent developments have centered around the design of specialized architectures, such as chiplet-based systems and field-programmable gate arrays (FPGAs), to accelerate machine learning workloads. Additionally, there is a growing emphasis on optimizing concurrent deep neural network (DNN) training and inferencing on edge devices, as well as characterizing the performance of accelerated edge devices for training DNN models. Noteworthy papers in this area include: Automatic Microarchitecture-Aware Custom Instruction Design for RISC-V Processors, which presents a front-to-back tool for ASIP design that automatically analyzes hotspots in RISC-V applications and generates custom instruction suggestions. Decentor-V, which extends L-SGD to RISC-V-based MCUs and introduces an 8-bit quantized version of L-SGD for RISC-V, achieving nearly 4x reduction in memory usage and a 2.2x speedup in training time. Chiplet-Based RISC-V SoC with Modular AI Acceleration, which presents a novel chiplet-based RISC-V SoC architecture that integrates 4 different key innovations and achieves significant performance improvements. Pagoda, which develops a time roofline and a novel energy roofline model for the Jetson Orin AGX and couples it with an analytical model to analyze DNN inference workloads from first principles. Fulcrum, which designs an intelligent time-slicing approach for concurrent DNN training and inferencing on Jetsons and proposes efficient multi-dimensional gradient descent search and Active Learning techniques to optimize the training throughput and stay within latency and power budgets.

Sources

Automatic Microarchitecture-Aware Custom Instruction Design for RISC-V Processors

Decentor-V: Lightweight ML Training on Low-Power RISC-V Edge Devices

Lightweight Congruence Profiling for Early Design Exploration of Heterogeneous FPGAs

Chiplet-Based RISC-V SoC with Modular AI Acceleration

Open-source Stand-Alone Versatile Tensor Accelerator

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

Pagoda: An Energy and Time Roofline Study for DNN Workloads on Edge Accelerators

Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators

Built with on top of