Advances in Secure and Efficient Software Development with LLMs

The field of software development is witnessing significant advancements with the integration of Large Language Models (LLMs). Recent research has focused on improving the security and efficiency of LLM-based systems, particularly in code generation, issue localization, and performance optimization. Notably, the development of novel approaches such as RepoLens, SecureAgentBench, and SemGuard has shown promising results in addressing concern mixing and scattering in large-scale repositories, evaluating secure code generation, and correcting semantic errors in LLM-generated code. Furthermore, the introduction of benchmarks like PerfBench, BuildBench, and MULocBench has facilitated the evaluation of LLM agents' capabilities in performance optimization, compiling real-world open-source software, and localizing code and non-code issues. Additionally, research on explainable fault localization, environment setup, and security assessment of AI code agents has highlighted the importance of developing more robust and reliable systems. Overall, the field is moving towards more secure, efficient, and transparent software development practices with the help of LLMs. Noteworthy papers include RepoLens, which improves issue localization by abstracting and leveraging conceptual knowledge from code repositories, and SemGuard, which performs real-time semantic supervision to correct semantic errors in LLM-generated code.

Sources

Extracting Conceptual Knowledge to Locate Software Issues

"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors

SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios

Automated Vulnerability Validation and Verification: A Large Language Model Approach

PerfBench: Can Agents Resolve Real-World Performance Bugs?

Takedown: How It's Done in Modern Coding Agent Exploits

SemGuard: Real-Time Semantic Evaluator for Correcting LLM-Generated Code

WARP -- Web-Augmented Real-time Program Repairer: A Real-Time Compilation Error Resolution using LLMs and Web-Augmented Synthesis

A Benchmark for Localizing Code and Non-Code Issues in Software Projects

BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software

PIPer: On-Device Environment Setup via Online Reinforcement Learning

Explainable Fault Localization for Programming Assignments via LLM-Guided Annotation

Red Teaming Program Repair Agents: When Correct Patches can Hide Vulnerabilities

LSPFuzz: Hunting Bugs in Language Servers

Improving Code Localization with Repository Memory

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

Deciphering WONTFIX: A Mixed-Method Study on Why GitHub Issues Get Rejected