The field of code intelligence and automated software engineering is rapidly evolving, with a focus on developing innovative methods to improve the accuracy and efficiency of software development and maintenance. Recent research has explored the use of large language models (LLMs) and machine learning techniques to enhance various aspects of software engineering, including fault localization, automated program repair, and code generation. Notably, the integration of LLMs with external memory and project-specific knowledge has shown significant improvements in fault localization, while the use of probabilistic methods and LLMs has enhanced software reverse engineering. Additionally, cross-lingual retrieval-augmented code generation has demonstrated effectiveness in migrating codebases across programming languages.
Some notable papers in this area include: LPASS, which introduces a linear probe approach to estimate the performance of compressed LLMs for vulnerability detection, achieving 86.9% accuracy in multi-class vulnerability detection. VietMix, which presents a Vietnamese-English code-mixed corpus and iterative augmentation methodology, resulting in up to 71.84 and 81.77 translation quality estimation scores on COMETkiwi and XCOMET, respectively.