Advancements in Large Language Models for Software Development and Testing

The field of software development and testing is witnessing significant advancements with the integration of Large Language Models (LLMs). Recent research has focused on improving the capabilities of LLMs in generating high-quality code, testing software applications, and evaluating their performance. One notable direction is the development of novel frameworks and methods for automated testing and evaluation of LLMs, such as interactive evaluation frameworks and guideline-upholding tests. Another area of research is the application of LLMs in software development, including code generation, code completion, and code review. Furthermore, researchers are exploring the use of LLMs in identifying and addressing potential biases and flaws in software applications. Noteworthy papers in this area include SATORI, which introduces a static test oracle generation approach for REST APIs, and LaQual, which proposes a framework for automated evaluation of LLM app quality. Overall, these advancements have the potential to significantly improve the efficiency, effectiveness, and reliability of software development and testing processes.

Sources

Research on intelligent generation of structural demolition suggestions based on multi-model collaboration

From Benchmark Data To Applicable Program Repair: An Experience Report

SATORI: Static Test Oracle Generation for REST APIs

AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions

Syntactic Completions with Material Obligations

Investigating red packet fraud in Android applications: Insights from user reviews

Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents

LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution

LaQual: A Novel Framework for Automated Evaluation of LLM App Quality

Interactive Evaluation of Large Language Models for Multi-Requirement Software Engineering Tasks

Alignment with Fill-In-the-Middle for Enhancing Code Generation

Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking

GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs

Automated Quality Assessment for LLM-Based Complex Qualitative Coding: A Confidence-Diversity Framework

Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol