Advances in Software Vulnerability Detection and Analysis

The field of software engineering is moving towards more advanced and automated methods of vulnerability detection and analysis. Recent research has focused on the development of comprehensive datasets, such as software defect datasets and vulnerability datasets, to facilitate empirical research and benchmarking of various techniques. The use of large language models (LLMs) and machine learning techniques has also shown great potential in tasks such as bug report analysis, binary code understanding, and process mining. Furthermore, the integration of LLMs with static analysis has been explored for hardware security bug detection. Noteworthy papers in this area include: BugsRepo, which introduces a curated dataset of bug reports and contributor information to support software maintenance tasks. BinPool, which presents a dataset of vulnerabilities for binary security analysis. LASHED, which combines LLMs and static analysis for early detection of RTL bugs.

Sources

From Bugs to Benchmarks: A Comprehensive Survey of Software Defect Datasets

BugsRepo: A Comprehensive Curated Dataset of Bug Reports, Comments and Contributors Information from Bugzilla

BinPool: A Dataset of Vulnerabilities for Binary Security Analysis

BinCoFer: Three-Stage Purification for Effective C/C++ Binary Third-Party Library Detection

An Empirical Study on Common Defects in Modern Web Browsers Using Knowledge Embedding in GPT-4o

Enhancing Vulnerability Reports with Automated and Augmented Description Summarization

Unlocking User-oriented Pages: Intention-driven Black-box Scanner for Real-world Web Applications

An Empirical Study on the Capability of LLMs in Decomposing Bug Reports

On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks

Padding Matters -- Exploring Function Detection in PE Files

LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs

An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding

When Deep Learning Meets Information Retrieval-based Bug Localization: A Survey