Advancements in Mobile Agents

The field of mobile agents is rapidly evolving, with a focus on improving accuracy, efficiency, and generalization across tasks, modalities, apps, and devices. Recent developments have led to the creation of more sophisticated mobile agent systems, incorporating multimodal foundation models, retrieval-augmented generation, and cloud-device collaboration. These advancements have enabled mobile agents to better understand user queries, interact with external environments, and learn from previous mistakes. Notable papers in this area include MobiAgent, which achieves state-of-the-art performance in real-world mobile scenarios, and AppCopilot, which operationalizes a full-stack, closed-loop system for mobile agents. Other noteworthy papers are KG-RAG, which enhances GUI agent decision-making via knowledge graph-driven retrieval-augmented generation, and VehicleWorld, which introduces a comprehensive environment for intelligent vehicle interaction.

Sources

MobiAgent: A Systematic Framework for Customizable Mobile Agents

KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation

Cloud-Device Collaborative Agents for Sequential Recommendation

AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent

MobileRAG: Enhancing Mobile Agent with Retrieval-Augmented Generation

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

VehicleWorld: A Highly Integrated Multi-Device Environment for Intelligent Vehicle Interaction

Built with on top of