Efficient Deep Learning Inference on Edge Devices

The field of edge computing is moving towards achieving fast and energy-efficient deep learning inference on resource-constrained devices. Recent research focuses on optimizing collaborative inference systems, dynamic routing strategies, and novel hardware architectures to reduce latency and energy consumption. Notable advancements include the use of parallel computing techniques, silicon photonics, and hardware-software co-design to accelerate deep learning models. These innovations have shown significant improvements in performance and efficiency, making them promising solutions for real-time vision-based analytics and human activity recognition applications. Noteworthy papers include: Intra-DP, which achieves up to 50% reduction in per-inference latency and 75% reduction in energy consumption. ECORE, which reduces energy consumption and latency by 45% and 49%, respectively, while maintaining high detection accuracy. Opto-ViT, which achieves 100.4 KFPS/W with up to 84% energy savings and less than 1.6% accuracy loss. MM2IM, which accelerates transposed convolutions on FPGA-based edge devices, achieving an average speedup of 1.9x. TinierHAR, which introduces an ultra-lightweight deep learning architecture for human activity recognition, reducing parameters and MACs by 2.7x and 6.4x, respectively.

Sources

Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing

ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge

Opto-ViT: Architecting a Near-Sensor Region of Interest-Aware Vision Transformer Accelerator with Silicon Photonics

Accelerating Transposed Convolutions on FPGA-based Edge Devices

TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices

Built with on top of