Advances in Large Language Models for Task Planning and Physical Reasoning

The field of large language models (LLMs) is rapidly advancing, with a focus on improving task planning and physical reasoning capabilities. Recent research has explored the integration of LLMs with formal knowledge representations, such as ontologies, to enhance their ability to process symbolic knowledge. Additionally, there is a growing interest in developing benchmarks to evaluate the physical reasoning capabilities of LLMs, including their ability to combine domain knowledge, symbolic reasoning, and understanding of real-world constraints. Noteworthy papers in this area include Code-Driven Planning in Grid Worlds with Large Language Models, which proposes an iterative programmatic planning framework for solving grid-based tasks, and OntoURL, which introduces a comprehensive benchmark to evaluate LLMs' proficiency in handling ontologies. Other notable papers include APEX, which equips LLMs with physics-driven foresight for real-time task planning, and PhyX, which assesses models' capacity for physics-grounded reasoning in visual scenarios.

Sources

Code-Driven Planning in Grid Worlds with Large Language Models

OntoURL: A Benchmark for Evaluating Large Language Models on Symbolic Ontological Understanding, Reasoning and Learning

LODGE: Joint Hierarchical Task Planning and Learning of Domain Models with Grounded Execution

APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight

BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks

Addressing the Challenges of Planning Language Generation

SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models

PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution

Built with on top of