Debiasing and Multimodal Capabilities in Large Language Models

The field of large language models is moving towards mitigating dataset biases and improving multimodal capabilities. Recent developments have focused on debiasing methods, such as learning to be undecided in predictions for biased data samples, and preference optimization to address modality bias. Additionally, there is a growing interest in enhancing multimodal comprehension capabilities through instruction-oriented preference alignment and high-value data selection. These advances aim to improve the performance and robustness of large language models, particularly in out-of-domain and hard test samples.

Noteworthy papers include:

  • FairFlow, which introduces a debiasing framework that learns to be undecided in its predictions for biased data samples, resulting in improved performance on out-of-domain test samples.
  • Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization, which proposes a noise-aware preference optimization algorithm to mitigate modality bias in multimodal large language models.
  • Learning to Instruct for Visual Instruction Tuning, which achieves significant improvements in multimodal benchmarks by incorporating the loss function into both instruction and response sequences.

Sources

FairFlow: Mitigating Dataset Biases through Undecided Learning

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions

Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs

MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning

Learning to Instruct for Visual Instruction Tuning

Built with on top of