Advances in Proteomics and Molecular Discovery

The field of proteomics and molecular discovery is rapidly advancing with the integration of machine learning and large language models. A key direction is the development of foundation models that can unify various tasks and improve performance on downstream tasks. Another notable trend is the use of multimodal contrastive alignment and parameterized reasoning to enhance protein function prediction and drug discovery. Noteworthy papers include:

  • Prot2Text-V2, which introduces a novel multimodal sequence-to-text model for protein function prediction.
  • DrugPilot, an LLM-based agent with parameterized reasoning for drug discovery that outperforms existing agents.
  • ChemMLLM, a unified chemical multimodal large language model that achieves superior performance on molecule understanding and generation tasks.

Sources

Foundation model for mass spectrometry proteomics

Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment

DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery

MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation

A Survey of Large Language Models for Text-Guided Molecular Discovery: from Molecule Generation to Optimization

ChemMLLM: Chemical Multimodal Large Language Model

Improving Chemical Understanding of LLMs via SMILES Parsing

Structure-Aligned Protein Language Model

Built with on top of