The field of web agent navigation and automation is moving towards more interactive and scalable approaches. Researchers are focusing on developing agents that can master short-horizon interactions on multiple UI components, such as choosing the correct date in a date picker or scrolling in a container to extract information. This is essential for robust web planning and navigation. Noteworthy papers include:
- WARC-Bench, which introduces a novel web navigation benchmark featuring 438 tasks designed to evaluate multimodal AI agents on subtasks.
- BrowserAgent, which proposes a more interactive agent that solves complex tasks through human-inspired browser actions and achieves competitive results across different Open-QA tasks.