Advancements in Deep Learning System Optimization

The field of deep learning system optimization is moving towards leveraging large language models (LLMs) and automated techniques to improve performance, energy efficiency, and portability. Researchers are exploring the use of LLMs to automate energy-aware refactoring of parallel scientific codes, achieving significant energy reductions. Additionally, autotuning and just-in-time compilation are being combined to enable portable, state-of-the-art performance without manual code optimizations. The reuse of mining-specific GPUs in AI applications is also being investigated, with promising results in restoring computational capabilities and reducing electronic waste. Noteworthy papers include: QiMeng-Xpiler, which proposes a novel transcompiler for automatically translating tensor programs across deep learning systems, achieving high accuracy and performance. LASSI-EE, an automated LLM-based refactoring framework that generates energy-efficient parallel code, achieving an average energy reduction of 47% across 85% of tested benchmarks.

Sources

QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach

Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes

GPU Performance Portability needs Autotuning

Exploration of Cryptocurrency Mining-Specific GPUs in AI Applications: A Case Study of CMP 170HX

Can Large Language Models Predict Parallel Code Performance?

Built with on top of