Large Language Models in Biomedical Applications

The field of large language models (LLMs) in biomedical applications is rapidly advancing, with a focus on improving their effectiveness and safety in real-world clinical settings. Recent developments have highlighted the importance of evaluating LLMs in context-specific scenarios, particularly in low-resource settings and for diseases that are prevalent in certain regions. There is a growing recognition of the need for guideline-driven, dynamic benchmarking to support the safe deployment of AI systems in healthcare. Researchers are also exploring new methods for optimizing LLMs, such as dual-phase self-evolution frameworks and dynamic bi-level optimization. Furthermore, studies have demonstrated the potential of LLM-based clinical decision support tools to reduce errors and improve patient care in primary care settings. Noteworthy papers in this area include Retrieval-Augmented Clinical Benchmarking for Contextual Model Testing in Kenyan Primary Care, which introduces a methodology for creating a benchmark dataset and evaluation framework focused on Kenyan clinical care, and AI-based Clinical Decision Support for Primary Care, which evaluates the impact of LLM-based clinical decision support in live care. Another notable paper is HIVMedQA, which introduces a benchmark designed to assess open-ended medical question answering in HIV care and evaluates the current capabilities of LLMs in HIV management.

Sources

Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks

Retrieval-Augmented Clinical Benchmarking for Contextual Model Testing in Kenyan Primary Care: A Methodology Paper

ChiMed 2.0: Advancing Chinese Medical Dataset in Facilitating Large Language Modeling

A Novel Self-Evolution Framework for Large Language Models

Towards physician-centered oversight of conversational diagnostic AI

LLM Data Selection and Utilization via Dynamic Bi-level Optimization

Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens

AI-based Clinical Decision Support for Primary Care: A Real-World Study

From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes

A Custom-Built Ambient Scribe Reduces Cognitive Load and Documentation Burden for Telehealth Clinicians

HIVMedQA: Benchmarking large language models for HIV medical decision support

Built with on top of