Advances in Game Playing and Reasoning with Large Language Models

The field of game playing and reasoning with Large Language Models (LLMs) is rapidly advancing, with a focus on developing frameworks and benchmarks to evaluate and improve LLM performance. Researchers are exploring the use of LLMs to generate code for board games, create puzzle game engines, and analyze complex rule interactions in dynamic environments. A key challenge in this area is developing methods to evaluate the semantic fidelity of LLMs in structured environments, such as chess. Noteworthy papers in this area include: Boardwalk, which proposes a framework for creating board games with LLMs and achieves a 55.6% success rate with the best performing model. PuzzleJAX, which introduces a GPU-accelerated puzzle game engine and description language for rapid benchmarking of tree search, reinforcement learning, and LLM reasoning abilities.

Advances in Game Playing and Reasoning with Large Language Models

Sources