Advancements in Deep Learning System Optimization

The field of deep learning system optimization is moving towards leveraging large language models (LLMs) and automated techniques to improve performance, energy efficiency, and portability. Researchers are exploring the use of LLMs to automate energy-aware refactoring of parallel scientific codes, achieving significant energy reductions. Additionally, autotuning and just-in-time compilation are being combined to enable portable, state-of-the-art performance without manual code optimizations. The reuse of mining-specific GPUs in AI applications is also being investigated, with promising results in restoring computational capabilities and reducing electronic waste. Noteworthy papers include: QiMeng-Xpiler, which proposes a novel transcompiler for automatically translating tensor programs across deep learning systems, achieving high accuracy and performance. LASSI-EE, an automated LLM-based refactoring framework that generates energy-efficient parallel code, achieving an average energy reduction of 47% across 85% of tested benchmarks.

Advancements in Deep Learning System Optimization

Sources