Advances in Protein Representation Learning and Prediction

The field of protein research is rapidly advancing with the development of innovative machine learning approaches. A key direction is the integration of heterogeneous data sources and modalities to improve protein representation learning and prediction. Recent work has focused on addressing the challenges of cross-modal distributional mismatch and noisy relational graphs, with proposed solutions including optimal transport-based representation alignment and conditional graph generation-based information fusion. Another area of advancement is the development of mechanism-aware frameworks that unify residue-level post-translational modification profiling with enzyme-substrate assignment, allowing for the learning of biologically coherent patterns of cooperative and antagonistic modifications. Notable papers include:

  • Meta-Learning for Cross-Task Generalization in Protein Mutation Property Prediction, which introduces a novel mutation encoding strategy and achieves significant advantages over traditional fine-tuning approaches.
  • A Novel Framework for Multi-Modal Protein Representation Learning, which proposes a unified framework that addresses cross-modal heterogeneity and noisy relational graphs.
  • Learning the PTM Code through a Coarse-to-Fine, Mechanism-Aware Framework, which establishes new state-of-the-art performance in multi-label site prediction and zero-shot enzyme assignment.

Sources

Meta-Learning for Cross-Task Generalization in Protein Mutation Property Prediction

Boltzmann Graph Ensemble Embeddings for Aptamer Libraries

A Multimodal Human Protein Embeddings Database: DeepDrug Protein Embeddings Bank (DPEB)

A Novel Framework for Multi-Modal Protein Representation Learning

Learning the PTM Code through a Coarse-to-Fine, Mechanism-Aware Framework

Augmenting Biological Fitness Prediction Benchmarks with Landscapes Features from GraphFLA

Built with on top of