Efficient Model Training and Deployment in Resource-Constrained Settings

The field of model training and deployment is moving towards efficient and scalable solutions for resource-constrained settings, such as edge devices and low-bandwidth environments. Researchers are exploring alternatives to traditional backpropagation methods, such as forward-mode automatic differentiation and zero-order optimization, but these methods often come with significant costs in accuracy, convergence speed, and computation. Instead, techniques like activation checkpointing and progressive precision update are being developed to improve the efficiency of model training and deployment. These approaches enable the transmission of lower-bit precision models, reducing bandwidth usage and latency, while maintaining competitive accuracy. Additionally, there is a growing interest in adaptive model quantization, dynamic workload balancing, and private adaptive optimizers to further improve the efficiency and scalability of model training and deployment. Noteworthy papers include: The Cost of Avoiding Backpropagation, which presents a comprehensive comparison of backpropagation, forward-mode automatic differentiation, and zero-order optimization methods, highlighting the limitations of the latter two approaches. FF-INT8, which proposes an efficient forward-forward DNN training approach on edge devices with INT8 precision, achieving significant savings in energy and memory usage. P$^2$U, which introduces a progressive precision update approach for efficient model distribution, demonstrating a better tradeoff between accuracy, bandwidth usage, and latency. QPART, which presents an adaptive model quantization and dynamic workload balancing approach for accuracy-aware edge inference, optimizing layer-wise quantization bit width and partition points to minimize time consumption and cost. FlashDP, which introduces an innovative cache-friendly per-layer DP-SGD approach for private training of large language models, reducing memory movement and redundant computations while achieving high throughput and accuracy.

Sources

The Cost of Avoiding Backpropagation

FF-INT8: Efficient Forward-Forward DNN Training on Edge Devices with INT8 Precision

P$^2$U: Progressive Precision Update For Efficient Model Distribution

QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference

Plan-Based Scalable Online Virtual Network Embedding

On Design Principles for Private Adaptive Optimizers

FlashDP: Private Training Large Language Models with Efficient DP-SGD

Handling out-of-order input arrival in CEP engines on the edge combining optimistic, pessimistic and lazy evaluation

Built with on top of