The field of edge AI is moving towards deploying complex models on severely resource-constrained hardware, with a focus on efficient inference and real-time applications. Researchers are exploring innovative methods to optimize model performance, reduce latency, and minimize energy consumption. Noteworthy developments include the emergence of Tiny Deep Learning, which enables the deployment of deep learning models on edge devices, and the introduction of optimization frameworks that leverage hierarchical speculative decoding, adaptive core selection, and quantization. These advancements have significant implications for applications in computer vision, audio recognition, healthcare, and industrial monitoring. Notable papers include:
- From Tiny Machine Learning to Tiny Deep Learning: A Survey, which provides a comprehensive overview of the transition from TinyML to TinyDL.
- LLMs on a Budget? Say HOLA, which introduces an end-to-end optimization framework for efficient LLM deployment.
- WiLLM: An Open Wireless LLM Communication System, which proposes an innovative wireless system specifically designed for LLM services.
- MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection, which reduces LLM decoding energy while keeping decode speed within an acceptable slowdown threshold.