Efficient Deep Neural Networks for Edge Devices

The field of deep neural networks is moving towards developing more efficient models that can be deployed on edge devices with limited resources. Researchers are exploring innovative techniques to reduce the memory footprint and computational cost of these models, including compression, pruning, and quantization. One of the key trends is the use of explainable AI methods to understand the inner functioning of deep neural networks and identify areas where compression can be applied without sacrificing accuracy. Another area of focus is the development of dynamic activation frameworks that can efficiently compress activations during training, enabling on-device training for deep neural networks. Furthermore, unstructured inference-time pruning methods are being developed to dynamically identify and skip unnecessary operations during inference, leading to significant reductions in computational cost and energy consumption. Notable papers in this area include:

  • Compressing Deep Neural Networks Using Explainable AI, which proposes a novel compression approach using XAI to reduce model size with negligible accuracy loss.
  • Secure and Storage-Efficient Deep Learning Models for Edge AI Using Automatic Weight Generation, which introduces a framework for dynamic weight generation and compression, achieving significant memory savings and improved security.
  • DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training, which enables efficient on-device training through system-level optimizations and achieves substantial memory savings and speedup.
  • UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs, which introduces a lightweight method for dynamic pruning during inference, leading to significant reductions in computational cost and energy consumption.

Sources

Compressing Deep Neural Networks Using Explainable AI

Secure and Storage-Efficient Deep Learning Models for Edge AI Using Automatic Weight Generation

DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training

UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs

Built with on top of