Advancements in Large Language Models for Software Development

The field of software development is witnessing significant advancements with the integration of Large Language Models (LLMs). Recent studies have explored the application of LLMs in various aspects of software development, including code review, code translation, and code generation. A notable trend is the focus on improving the efficiency and effectiveness of LLMs in these tasks, with techniques such as fine-tuning and prompting being employed to enhance their performance. Additionally, there is a growing emphasis on evaluating the quality and security of LLM-generated code, with metrics such as correctness, efficiency, and maintainability being used to assess their capabilities. Noteworthy papers in this area include TRACY, which introduces a comprehensive benchmark for evaluating the execution efficiency of LLM-translated code, and COMPASS, which proposes a multi-dimensional evaluation framework for assessing code generation in LLMs. Overall, the field is moving towards developing more robust and reliable LLMs that can be effectively utilized in software development tasks.

Sources

The Impact of Large Language Models (LLMs) on Code Review Process

TRACY: Benchmarking Execution Efficiency of LLM-Based Code Translation

Large Language Models in the Data Science Lifecycle: A Systematic Mapping Study

Code Vulnerability Detection Across Different Programming Languages with AI Models

WIP: Leveraging LLMs for Enforcing Design Principles in Student Code: Analysis of Prompting Strategies and RAG

Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset

How Much Can a Behavior-Preserving Changeset Be Decomposed into Refactoring Operations?

LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery

XAMT: Cross-Framework API Matching for Testing Deep Learning Libraries

Strengthening Programming Comprehension in Large Language Models through Code Generation

ChangePrism: Visualizing the Essence of Code Changes

RUM: Rule+LLM-Based Comprehensive Assessment on Testing Skills

The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget

COMPASS: A Multi-Dimensional Benchmark for Evaluating Code Generation in Large Language Models

Prompt Orchestration Markup Language

Measuring LLM Code Generation Stability via Structural Entropy

Static Analysis as a Feedback Loop: Enhancing LLM-Generated Code Beyond Correctness

Assessing the Quality and Security of AI-Generated Code: A Quantitative Analysis

On the need to perform comprehensive evaluations of automated program repair benchmarks: Sorald case study

LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model

SynthCoder: A Synthetical Strategy to Tune LLMs for Code Completion

Evaluation Guidelines for Empirical Studies in Software Engineering involving LLMs

Towards Scalable and Interpretable Mobile App Risk Analysis via Large Language Models

Built with on top of