Advancements in GUI Agents

The field of Graphical User Interface (GUI) agents is rapidly advancing, with a focus on developing more efficient, scalable, and robust methods for automating computer tasks. Recent developments have seen a shift towards integrating coding as a core action, enabling agents to bypass inefficient GUI action sequences and improve overall performance. Additionally, there is a growing emphasis on uncertainty-aware agents, which can adapt to complex and ambiguous tasks through adaptive perception and human-in-the-loop refinement. Another key area of research is the development of verifiable long-chain GUI datasets, which can facilitate the evaluation and development of generalist GUI agents operating in realistic computer environments. Noteworthy papers in this area include: OID-PPO, which proposes a novel RL framework for optimal interior design, MagicGUI, which presents a foundational mobile GUI agent with a scalable data pipeline and reinforcement fine-tuning, CoAct-1, which introduces a multi-agent system that combines GUI-based control with direct programmatic execution, Uncertainty-Aware GUI Agent, which addresses input redundancy and decision ambiguity through adaptive perception, VeriGUI, which introduces a verifiable long-chain GUI dataset for developing and evaluating generalist GUI agents, SEA, which proposes a self-evolution agent with step-wise reward for computer use, GuirlVG, which introduces a reinforcement learning-based GUI visual grounding method, SEAgent, which proposes a self-evolving computer use agent with autonomous learning from experience, Test-Time Reinforcement Learning for GUI Grounding, which proposes a test-time scaling method for improving GUI grounding accuracy.

Sources

OID-PPO: Optimal Interior Design using Proximal Policy Optimization by Transforming Design Guidelines into Reward Functions

MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning

CoAct-1: Computer-using Agents with Coding as Actions

Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement

VeriGUI: Verifiable Long-Chain GUI Dataset

SEA: Self-Evolution Agent with Step-wise Reward for Computer Use

GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Test-Time Reinforcement Learning for GUI Grounding via Region Consistency

Built with on top of