Self-Supervised Advancements in Language and Vision Models

The field of language and vision models is witnessing a significant shift towards self-supervised learning, with a focus on developing innovative frameworks that enable models to improve their performance without relying on extensive human-annotated data. This trend is driven by the need to reduce the costs and constraints associated with traditional supervised learning methods. Researchers are exploring various approaches, including self-play reinforcement learning, self-rewarding rubric-based reinforcement learning, and self-evolving vision-language models, to advance the capabilities of large language models and vision-language models. These advancements have the potential to improve performance in tasks such as long-context reasoning, open-ended reasoning, image quality assessment, and claim verification. Noteworthy papers in this area include: SPELL, which proposes a multi-role self-play reinforcement learning framework for long-context reasoning; Vision-Zero, which introduces a domain-agnostic framework for vision-language model self-improvement via strategic gamified self-play; and RESTRAIN, which presents a self-penalizing RL framework that converts the absence of gold labels into a useful learning signal.

Sources

SPELL: Self-Play Reinforcement Learning for evolving Long-Context Language Models

Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking

Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning

RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization

Built with on top of