Advancements in Software Testing with Large Language Models

The field of software testing is undergoing significant transformations with the integration of Large Language Models (LLMs). Recent developments indicate a shift towards leveraging LLMs for automated test generation, test refinement, and code analysis. Noteworthy advancements include the creation of novel benchmarks and frameworks that assess and improve the capabilities of LLMs in software testing. These innovations aim to address challenges such as test quality, coverage, and maintainability, ultimately enhancing the reliability and security of software systems. Notable papers in this area include FeatBench, which introduces a benchmark for evaluating coding agents on feature implementation, and JUnitGenie, a path-sensitive framework for unit test generation with LLMs. TENET is also significant, as it presents an LLM agent for generating functions under the Test-Driven Development setting, showcasing improved performance over existing baselines. Additionally, DiffTester accelerates unit test generation for diffusion LLMs, and RefFilter improves semantic conflict detection via refactoring-aware static analysis. These contributions underscore the potential of LLMs in revolutionizing software testing practices.

Sources

FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding

Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models

TENET: Leveraging Tests Beyond Validation for Code Generation

Unit Test Update through LLM-Driven Context Collection and Error-Type-Aware Refinement

DiffTester: Accelerating Unit Test Generation for Diffusion LLMs via Repetitive Pattern

Large Language Models for Software Testing: A Research Roadmap

Protocode: Prototype-Driven Interpretability for Code Generation in LLMs

Are Classical Clone Detectors Good Enough For the AI Era?

Hamster: A Large-Scale Study and Characterization of Developer-Written Tests

EQ-Robin: Generating Multiple Minimal Unique-Cause MC/DC Test Suites

Beyond Pass/Fail: The Story of Learning-Based Testing

PyTrim: A Practical Tool for Reducing Python Dependency Bloat

Enhancing Software Testing Education: Understanding Where Students Struggle

CodeGenLink: A Tool to Find the Likely Origin and License of Automatically Generated Code

RefFilter: Improving Semantic Conflict Detection via Refactoring-Aware Static Analysis

Clarifying Semantics of In-Context Examples for Unit Test Generation