Vision-Language-Action Models: Advancements and Challenges

The field of Vision-Language-Action (VLA) models is rapidly advancing, with a focus on improving their ability to generalize across diverse robotic platforms and tasks. Recent developments have explored the use of soft-prompted transformers, agentic frameworks, and hierarchical architectures to enhance the scalability and robustness of VLA models. Notably, these advancements have led to significant improvements in performance on various benchmarks, including LIBERO and Android-in-the-Wild. However, despite these successes, VLA models remain vulnerable to adversarial attacks and exhibit brittleness in the face of perturbations, highlighting the need for more advanced defense strategies and evaluation practices. Some noteworthy papers in this area include: X-VLA, which proposes a novel Soft Prompt approach for cross-embodiment robot learning, achieving state-of-the-art performance on several benchmarks. VLA-0, which introduces a simple yet powerful approach to building VLA models without modifying existing vocabulary or introducing special action heads, outperforming more involved models on the LIBERO benchmark. LIBERO-Plus, which performs a systematic vulnerability analysis of VLA models, exposing critical weaknesses and highlighting the need for evaluation practices that assess reliability under realistic variation.

Sources

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models

ManiAgent: An Agentic Framework for General Robotic Manipulation

VLA-0: Building State-of-the-Art VLAs with Zero Modification

Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control

VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation

Built with on top of