Efficient Deployment of Deep Learning Models on Edge Devices

The field of deep learning is moving towards efficient deployment of models on edge devices, with a focus on preserving user privacy and reducing computational overhead. Recent developments have led to the creation of novel methods for dataset sampling, attention mechanisms, and model optimization, which enable the deployment of large language models and other complex architectures on resource-constrained devices. These advancements have the potential to significantly improve the performance and efficiency of edge-based deep learning applications. Notable papers in this area include AdapSNE, which proposes an adaptive dataset sampling method for edge DNN training, and CoFormer, which introduces a collaborative inference system for scalable transformer inference on heterogeneous edge devices. Additionally, papers like Zen-Attention and Puzzle have made significant contributions to optimizing attention mechanisms and scheduling multiple deep learning models on mobile devices with heterogeneous processors.

Sources

AdapSNE: Adaptive Fireworks-Optimized and Entropy-Guided Dataset Sampling for Edge DNN Training

Dynamic Sparse Attention on Mobile SoCs

Zen-Attention: A Compiler Framework for Dynamic Attention Folding on AMD NPUs

Characterizing the Behavior of Training Mamba-based State Space Models on GPUs

Puzzle: Scheduling Multiple Deep Learning Models on Mobile Device with Heterogeneous Processors

CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference

Uncovering the Spectral Bias in Diagonal State Space Models

Built with on top of