Advancements in Human-Computer Interaction and Surgical Video Analysis

The field of human-computer interaction is moving towards more accessible and convenient interfaces, with a focus on speech-based instructions for graphical user interface (GUI) agents. Researchers are exploring ways to improve the reliability and robustness of these agents, particularly in real-world scenarios with diverse anomalies. In the area of surgical video analysis, there is a growing need for large, high-quality annotated datasets to support the development of intelligent systems for surgical training, decision support, and patient outcomes. Notable papers in this area include: GUIRoboTron-Speech, which proposes an end-to-end autonomous GUI agent that accepts speech instructions and on-device screenshots to predict actions. GynSurg, which introduces a comprehensive gynecology laparoscopic surgery dataset with rich annotations across multiple tasks. Meta-SurDiff, which presents a meta-learning-optimized classification diffusion model for reliable online surgical phase recognition. AgentSynth, which introduces a scalable pipeline for automatically synthesizing high-quality tasks and trajectory datasets for generalist computer-use agents. GUI-Robust, which proposes a novel dataset for comprehensive GUI agent evaluation, explicitly incorporating common types of anomalies observed in everyday GUI interactions.

Sources

GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions

GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset

Meta-SurDiff: Classification Diffusion Model Optimized by Meta Learning is Reliable for Online Surgical Phase Recognition

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents

GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies

Built with on top of