Advancements in Automated Code Generation and Evaluation

The field of automated code generation and evaluation is rapidly advancing, with a focus on improving the quality and efficiency of code generation models. Researchers are exploring innovative approaches to generate high-quality test cases, enhance automated assessment in programming education, and develop adaptive decoding frameworks for large language models. Notably, the integration of program analysis and large language models has shown promising results in generating high-coverage unit tests. Furthermore, new benchmarks and evaluation frameworks are being introduced to assess the performance of coding agents and promote future advancements. Some particularly noteworthy papers include:

  • CodeContests+, which introduces an LLM-based agent system for creating high-quality test cases for competitive programming problems.
  • AdaDec, an uncertainty-guided adaptive decoding framework that improves the reliability and efficiency of LLM-based code generation.
  • OIBench, a high-quality, private, and challenging olympiad-level informatics dataset for benchmarking strong reasoning models.

Sources

CodeContests+: High-Quality Test Case Generation for Competitive Programming

Leveraging Generative AI for Enhancing Automated Assessment in Programming Education Contests

AdaDec: Uncertainty-Guided Adaptive Decoding for LLM-based Code Generation

Boosting Rust Unit Test Coverage through Hybrid Program Analysis and Large Language Models

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

Reward Models Enable Scalable Code Verification by Trading Accuracy for Throughput

OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics

SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks

Built with on top of