Large Language Models in Compiler Development and Code Analysis

The field of compiler development and code analysis is witnessing a significant shift with the integration of large language models (LLMs). Researchers are exploring the potential of LLMs in automating coding tasks, such as generating code from scratch, repairing software, and decompiling binaries. The use of LLMs is enabling the development of more efficient and accurate compilation tools, and is also facilitating the discovery of innovative optimization techniques. Noteworthy papers in this area include: Adding New Capability in Existing Scientific Application with LLM Assistance, which proposes a new methodology for writing code from scratch using LLM assistance. Build-bench, which evaluates the capability of LLMs to repair build failures in cross-ISA settings. QiMeng-NeuComBack, which introduces a novel benchmark dataset for IR-to-assembly compilation and proposes a self-evolving prompt optimization method to enhance LLM-generated assembly code. Context-Guided Decompilation, which proposes a hybrid decompilation framework that leverages in-context learning to guide LLMs toward generating re-executable source code. OMPILOT, which introduces a novel domain-specific encoder-decoder transformer for translating C++ code into OpenMP, enabling effective shared-memory parallelization. Exploring the Feasibility of End-to-End Large Language Model as a Compiler, which explores the feasibility of LLM as a compiler and proposes practical architectural designs and future research directions.

Sources

Adding New Capability in Existing Scientific Application with LLM Assistance

Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World Systems

QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code

Context-Guided Decompilation: A Step Towards Re-executability

How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis

OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms

Exploring the Feasibility of End-to-End Large Language Model as a Compiler

Built with on top of