Advancements in AI-Driven Research Automation

The field of research automation is rapidly evolving, with a growing focus on leveraging artificial intelligence (AI) and large language models (LLMs) to streamline various aspects of the research process. One notable direction is the development of AI systems capable of assessing the reproducibility of social science research, which has the potential to promote rigor and transparency in research practices. Another area of innovation is the application of LLMs in title-abstract screening for systematic reviews, which could significantly reduce the workload of researchers and improve the efficiency of the review process. Furthermore, the use of LLMs in automated code review and generation is gaining traction, with potential benefits including improved code quality, reduced maintenance costs, and enhanced collaboration among developers. Noteworthy papers in this area include: REPRO-Bench, which introduces a benchmark for evaluating the reproducibility of social science papers using AI agents, and achieves a significant improvement in accuracy over existing agents. SESR-Eval, which presents a dataset for evaluating LLMs in title-abstract screening and provides insights into the performance of different LLMs in this task. From Articles to Code, which demonstrates the potential of LLMs to generate core algorithms from scientific publications, paving the way for on-demand code generation and reduced maintenance overhead. LLM-Based Identification of Infostealer Infection Vectors, which showcases a novel approach to identifying infection vectors using LLMs and screenshots, highlighting the potential of AI-driven analysis in enhancing threat intelligence.

Sources

REPRO-Bench: Can Agentic AI Systems Assess the Reproducibility of Social Science Research?

SESR-Eval: Dataset for Evaluating LLMs in the Title-Abstract Screening of Systematic Reviews

Automated Code Review Using Large Language Models at Ericsson: An Experience Report

Machine Learning Experiences: A story of learning AI for use in enterprise software testing that can be used by anyone

From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications

Quality Evaluation of COBOL to Java Code Transformation

LLM-Based Identification of Infostealer Infection Vectors from Screenshots: The Case of Aurora

Built with on top of