Instruction Following in Large Language Models

The field of instruction following in large language models (LLMs) is rapidly advancing, with a focus on improving the ability of LLMs to understand and execute complex instructions. Recent developments have highlighted the importance of evaluating LLMs in realistic scenarios, such as agentic applications and grounded environments, where they are required to follow instructions that involve complex constraints and nuanced language. A key area of innovation is the development of new benchmarks and evaluation frameworks, such as GuideBench and AgentIF, which assess the ability of LLMs to follow domain-oriented guidelines and instructions in agentic scenarios. These benchmarks have revealed significant challenges for current LLMs, including the need to improve their ability to handle lengthy instructions, complex constraint structures, and nuanced language. Another area of advancement is the development of novel training methods and frameworks, such as BLEUBERI and DecIF, which leverage reference-based metrics and meta-decomposition to improve instruction following performance. These approaches have shown promising results, including improved performance on instruction-following tasks and increased flexibility and generalizability. Notable papers in this area include BLEUBERI, which introduces a method that uses BLEU as a reward function to train LLMs, and GuideBench, which provides a comprehensive benchmark for evaluating the domain-oriented guideline following capabilities of LLMs. Additionally, AgentIF introduces a benchmark for evaluating LLM instruction following ability in agentic scenarios, and DecIF presents a framework for generating diverse and high-quality instruction-following data using only LLMs.

Instruction Following in Large Language Models

Sources