Advances in Code Intelligence and Automated Software Engineering

The field of code intelligence and automated software engineering is rapidly evolving, with a focus on developing innovative methods to improve the accuracy and efficiency of software development and maintenance. Recent research has explored the use of large language models (LLMs) and machine learning techniques to enhance various aspects of software engineering, including fault localization, automated program repair, and code generation. Notably, the integration of LLMs with external memory and project-specific knowledge has shown significant improvements in fault localization, while the use of probabilistic methods and LLMs has enhanced software reverse engineering. Additionally, cross-lingual retrieval-augmented code generation has demonstrated effectiveness in migrating codebases across programming languages.

Some notable papers in this area include: LPASS, which introduces a linear probe approach to estimate the performance of compressed LLMs for vulnerability detection, achieving 86.9% accuracy in multi-class vulnerability detection. VietMix, which presents a Vietnamese-English code-mixed corpus and iterative augmentation methodology, resulting in up to 71.84 and 81.77 translation quality estimation scores on COMETkiwi and XCOMET, respectively.

Sources

Principal Context-aware Diffusion Guided Data Augmentation for Fault Localization

LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs

VietMix: A Naturally Occurring Vietnamese-English Code-Mixed Corpus with Iterative Augmentation for Machine Translation

Rethinking the effects of data contamination in Code Intelligence

Empirical Evaluation of Generalizable Automated Program Repair with Large Language Models

Fault Localisation and Repair for DL Systems: An Empirical Study with LLMs

Beyond C/C++: Probabilistic and LLM Methods for Next-Generation Software Reverse Engineering

Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation

Improving LLM-Based Fault Localization with External Memory and Project Context

Multi-Language Detection of Design Pattern Instances

Characterizing Multi-Hunk Patches: Divergence, Proximity, and LLM Repair Challenges

Hiding in Plain Sight: Query Obfuscation via Random Multilingual Searches

A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair

Built with on top of