Advances in Molecular Representation and Drug Design

The field of molecular representation and drug design is rapidly evolving, with a focus on developing innovative methods for predicting molecular properties and designing new drugs. Recent research has explored the use of large language models, diffusion models, and energy-based models to improve the accuracy and efficiency of molecular representation and drug design. These approaches have shown promise in addressing challenges such as predicting mutational effects on proteins, identifying drug-drug interactions, and designing molecules with specific biological activities. Notable papers in this area include: ActivityDiff, which proposes a novel diffusion model for de novo drug design that leverages separately trained drug-target classifiers for both positive and negative guidance. ImageDDI, which introduces an image-enhanced molecular motif sequence representation framework for drug-drug interaction prediction that outperforms state-of-the-art methods. Mol-R1, which presents a novel framework for explicit long-chain-of-thought reasoning in molecule discovery that improves explainability and reasoning performance. M2LLM, which proposes a multi-view framework that integrates three perspectives - molecular structure view, molecular task view, and molecular rules view - to achieve state-of-the-art performance on multiple benchmarks. CWFBind, which introduces a weighted, fast, and accurate docking method based on local curvature features that achieves competitive performance across multiple docking benchmarks. Energy-Based Models for Predicting Mutational Effects on Proteins, which proposes a new approach to predicting changes in binding free energy that avoids estimating the full conformational distribution of a protein complex. Chem3DLLM, which presents a unified protein-conditioned multimodal large language model that achieves state-of-the-art performance on structure-based drug design tasks. IBEX, which proposes an information-bottleneck-explored coarse-to-fine pipeline that tackles the chronic shortage of protein-ligand complex data in structure-based drug design. A Dataset for Distilling Knowledge Priors from Literature for Therapeutic Design, which introduces a dataset of priors for design problems extracted from literature that can be used to create models with strong priors for therapeutic design.

Sources

ActivityDiff: A diffusion model with Positive and Negative Activity Guidance for De Novo Drug Design

ImageDDI: Image-enhanced Molecular Motif Sequence Representation for Drug-Drug Interaction Prediction

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

$\text{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models

CWFBind: Geometry-Awareness for Fast and Accurate Protein-Ligand Docking

Energy-Based Models for Predicting Mutational Effects on Proteins

Chem3DLLM: 3D Multimodal Large Language Models for Chemistry

IBEX: Information-Bottleneck-EXplored Coarse-to-Fine Molecular Generation under Limited Data

A Dataset for Distilling Knowledge Priors from Literature for Therapeutic Design

Built with on top of