The field of proteomics and molecular discovery is rapidly advancing with the integration of machine learning and large language models. A key direction is the development of foundation models that can unify various tasks and improve performance on downstream tasks. Another notable trend is the use of multimodal contrastive alignment and parameterized reasoning to enhance protein function prediction and drug discovery. Noteworthy papers include:
- Prot2Text-V2, which introduces a novel multimodal sequence-to-text model for protein function prediction.
- DrugPilot, an LLM-based agent with parameterized reasoning for drug discovery that outperforms existing agents.
- ChemMLLM, a unified chemical multimodal large language model that achieves superior performance on molecule understanding and generation tasks.