The field of GUI agents and human-computer interaction is rapidly evolving, with a focus on developing more natural and intuitive interfaces. Recent research has emphasized the importance of mimicking human cognitive processes and incorporating adaptive learning mechanisms to improve agent performance. Notable advancements include the development of brain-inspired frameworks, adaptive region perception, and stochastic exploration methods for generating realistic and diverse GUI trajectories. These innovations have led to significant improvements in GUI agent performance, enabling more effective automation and interaction in digital environments.
Some noteworthy papers in this area include: BTL-UI, which proposes a brain-inspired framework for human-GUI interaction that achieves state-of-the-art performance in GUI understanding and interaction tasks. GUI-ARP, which introduces a novel framework for adaptive region perception and achieves strong competitiveness against open-source and proprietary models. GUI-ReWalk, which presents a reasoning-enhanced framework for synthesizing realistic and diverse GUI trajectories and enables superior coverage of diverse interaction flows. MobileRL, which develops an online agentic reinforcement learning framework that achieves state-of-the-art results in mobile GUI agent performance. UserRL, which proposes a unified framework for training and evaluating user-centric abilities through standardized gym environments paired with simulated users.