The field of large language models (LLMs) is rapidly advancing, with a focus on improving code generation and problem-solving capabilities. Recent developments have shown that LLMs can be used to generate high-quality competitive programming problems, approaching 99% consistency with official judgments. Additionally, LLMs have been successfully applied to automotive scenario generation, with mid-size open-source models showing promising results. The use of LLMs has also enabled the development of scalable test-time compute frameworks, achieving IOI gold medal-level performance with open-weight models. Furthermore, LLM-guided search has been proposed as a sample-efficient approach to program learning, outperforming traditional gradient-based training methods. Notable papers in this area include:
- AutoCode, which introduces a system for generating competition-grade problem statements and test cases using multiple rounds of validation.
- NL2Scenic, which presents a framework and dataset for evaluating Scenic code generation and shows that mid-size open-source models can be a practical option for autonomous-driving scenario programming.
- GenCluster, which achieves IOI gold medal-level performance using open-weight models and a scalable test-time compute framework.
- LLM-ERM, which proposes a propose-and-verify framework for sample-efficient program learning via LLM-guided search.