Advancements in Autonomous GUI Agents and Agent-Web Interaction

The field of artificial intelligence is witnessing significant developments in the area of autonomous GUI agents and agent-web interaction. Researchers are focusing on creating more efficient and robust agents that can interact with graphical user interfaces and websites in a more human-like way. One of the key directions is the integration of planning and grounding capabilities in GUI agents, enabling them to learn and improve through self-play optimization and training data distillation. Another important area is the development of declarative frameworks for agent-web interaction, which allows websites to expose reliable and auditable capabilities for AI agents, preserving user privacy and enabling seamless human-AI collaboration. The creation of multimodal dialogue datasets and models is also a notable trend, aiming to bridge the gap between traditional task-oriented dialogue systems and real-world scenarios. Furthermore, researchers are exploring the use of large language models and multi-agent systems for automated page design and GUI development, with a focus on agent-native efficiency and reliability. Noteworthy papers include: Co-EPG, which proposes a self-iterative training framework for co-evolution of planning and grounding in autonomous GUI agents. Building the Web for Agents, which introduces a declarative framework for agent-web interaction, enabling websites to expose reliable and auditable capabilities for AI agents. APD-Agents, which proposes a large language model-driven multi-agent framework for automated page design in mobile applications.

Sources

Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents

Building the Web for Agents: A Declarative Framework for Agent-Web Interaction

MMWOZ: Building Multimodal Agent for Task-oriented Dialogue

Agent-Oriented Visual Programming for the Web of Things

APD-Agents: A Large Language Model-Driven Multi-Agents Collaborative Framework for Automated Page Design

Computer-Use Agents as Judges for Generative User Interface

D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies

Built with on top of