Advancements in Large Language Models for Code Generation and Analysis

The field of large language models (LLMs) is rapidly evolving, with a focus on improving code generation and analysis capabilities. Recent research has explored the use of LLMs for automated code generation, code review, and code equivalence checking. Notable breakthroughs include the development of frameworks such as SwingArena, which evaluates LLMs on realistic software development workflows, and ResearchCodeBench, which assesses LLMs' ability to implement novel machine learning research code. Furthermore, advancements in reinforcement learning and fine-tuning techniques have significantly enhanced LLM performance in code generation and analysis tasks. While challenges persist, particularly in ensuring the correctness and reliability of generated code, the progress made in this area holds great promise for the future of software development and maintenance. Noteworthy papers include 'SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving', which presents a novel evaluation framework for LLMs, and 'ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code', which introduces a benchmark for assessing LLMs' ability to implement cutting-edge ML research code.

Sources

Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving

Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach

HardTests: Synthesizing High-Quality Test Cases for LLM Coding

CodeV-R1: Reasoning-Enhanced Verilog Generation

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation

EXP-Bench: Can AI Conduct AI Research Experiments?

Will Agents Replace Us? Perceptions of Autonomous Multi-Agent AI

Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability

Improving LLM-Generated Code Quality with GRPO

HEC: Equivalence Verification Checking for Code Transformation via Equality Saturation

ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code

Multi Layered Autonomy and AI Ecologies in Robotic Art Installations

A Multi-agent LLM-based JUit Test Generation with Strong Oracles

Towards More Effective Fault Detection in LLM-Based Unit Test Generation

A Preference-Driven Methodology for High-Quality Solidity Code Generation

How do Pre-Trained Models Support Software Engineering? An Empirical Study in Hugging Face

Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

Automated Traffic Incident Response Plans using Generative Artificial Intelligence: Part 1 -- Building the Incident Response Benchmark

The Stress of Improvisation: Instructors' Perspectives on Live Coding in Programming Classes

Software Bill of Materials in Software Supply Chain Security A Systematic Literature Review

Seed-Coder: Let the Code Model Curate Data for Itself

Design of a visual environment for programming by direct data manipulation

Solsmith: Solidity Random Program Generator for Compiler Testing

Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Automatic Multi-level Feature Tree Construction for Domain-Specific Reusable Artifacts Management

CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking

Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems

On the Practices of Autonomous Systems Development: Survey-based Empirical Findings

Leveraging Reward Models for Guiding Code Review Comment Generation

hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation

From Developer Pairs to AI Copilots: A Comparative Study on Knowledge Transfer

ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests

Tech-ASan: Two-stage check for Address Sanitizer