Advances in Large Language Models for Code Generation and Verification

The field of software engineering is witnessing significant advancements in the application of large language models (LLMs) for code generation and verification. Recent studies have highlighted the potential of LLMs in automating code generation tasks, but also revealed limitations and challenges, such as hallucinations and systematic failures in verifying code against natural language specifications. Researchers are actively exploring techniques to mitigate these issues, including the development of effective prompting strategies and benchmark datasets. Noteworthy papers in this area include: Hallucination in LLM-Based Code Generation: An Automotive Case Study, which investigates hallucination phenomena in code generation and highlights the need for effective mitigation techniques. Defects4Log: Benchmarking LLMs for Logging Code Defect Detection and Reasoning, which proposes a comprehensive taxonomy of logging code defects and evaluates LLMs' capability in detecting and reasoning logging code defects. Benchmark Dataset Generation and Evaluation for Excel Formula Repair with LLMs, which introduces a novel approach for constructing a benchmark dataset for Excel formula repair and evaluates the performance of various LLMs on this dataset. Uncovering Systematic Failures of LLMs in Verifying Code Against Natural Language Specifications, which uncovers systematic failures of LLMs in evaluating code correctness and proposes improved prompting strategies for mitigation. Type-Driven Prompt Programming: From Typed Interfaces to a Calculus of Constraints, which introduces a dependently typed calculus for prompt programming and highlights underexplored areas in constraint expressiveness. Preguss: It Analyzes, It Specifies, It Verifies, which outlines a modular framework for automating the generation and refinement of formal specifications. Software Model Checking via Summary-Guided Search, which describes a new software model-checking algorithm that treats the task of model checking a program as a directed search of the program states. Compositional Symbolic Execution for the Next 700 Memory Models, which provides a new formal foundation for memory-model-parametric compositional symbolic execution platforms.

Sources

Hallucination in LLM-Based Code Generation: An Automotive Case Study

Defects4Log: Benchmarking LLMs for Logging Code Defect Detection and Reasoning

Benchmark Dataset Generation and Evaluation for Excel Formula Repair with LLMs

Uncovering Systematic Failures of LLMs in Verifying Code Against Natural Language Specifications

Type-Driven Prompt Programming: From Typed Interfaces to a Calculus of Constraints

Preguss: It Analyzes, It Specifies, It Verifies

An Investigation Into Secondary School Students' Debugging Behaviour in Python

Software Model Checking via Summary-Guided Search (Extended Version)

Compositional Symbolic Execution for the Next 700 Memory Models (Extended Version)

Built with on top of