The field of autonomous driving and AI-powered evaluation is rapidly evolving, with a focus on improving the safety and reliability of automated systems. Recent developments have centered around the use of large language models (LLMs) for evaluating and verifying the correctness of complex systems, such as operational design domains and map transformations. However, the inconsistency and limitations of LLMs have also been highlighted, emphasizing the need for careful consideration and human oversight in their application. Notable advancements include the development of tools and frameworks that automate the verification of operational boundaries and enable scalable assurance of autonomous driving systems. Overall, the field is moving towards a more integrated and human-in-the-loop approach, combining the strengths of AI and human expertise to achieve more efficient and accurate evaluation and verification processes. Noteworthy papers include: VeriODD, which presents a tool for automating the translation of operational design domain specifications into formal languages, and LLM-Assisted Tool for Joint Generation of Formulas and Functions, which proposes a pipeline for jointly generating logical formulas and executable predicates for map transformation verification. Generate, Evaluate, Iterate also presents a promising approach for refining LLM judges using synthetic data, highlighting the potential for more efficient and scalable evaluation processes.