The field of automated code generation and verification is moving towards more reliable and efficient methods. Researchers are exploring the use of large language models, formal methods, and multi-agent frameworks to improve the quality of generated code and ensure its correctness. One notable trend is the development of datasets and benchmarks to evaluate the performance of automated code generation and verification tools. Another area of focus is the application of AI and machine learning to simplify complex statutory and regulatory language, and to derive behavioral specifications from legal texts.
Noteworthy papers in this area include: RePro, a reflective paper-to-code reproduction framework that achieves a 13.0% performance gap over baselines. ReDeFo, a multi-agent framework for reliable code generation that incorporates formal methods to strengthen quality assurance. CASP, a curated evaluation dataset of C code paired with formal specifications that enables systematic benchmarking of automated code generation and verification tools. LaborBench, a question-and-answer benchmark dataset designed to evaluate AI capabilities in statutory simplification. Solvable Tuple Patterns, a method for expressing invariants between list-like recursive data structures that can be efficiently inferred from positive samples. From Law to Gherkin, a human-centred quasi-experiment on the quality of LLM-generated behavioral specifications from food-safety regulations.