The field of large language models is moving towards increasingly safety-critical applications, with a focus on ensuring safe and interpretable interactions between humans, machines, and the environment. Recent developments have seen the integration of large language models with multimodal sensor data, such as visual perception and geocoded positioning, to generate natural-language alerts and improve decision-making in areas like electric vehicle integration and construction safety inspections. Notable advancements include the use of visual language models to assess object detection results and guide their refinement, as well as the development of multi-agent architectures for infrastructure-as-code generation. These innovations have the potential to enhance occupational safety monitoring, improve the reliability of infrastructure deployments, and provide provably safe decision-making for automated vehicles.
Noteworthy papers include: MonitorVLM, which introduces a novel vision-language framework for safety violation detection in mining operations, achieving significant improvements in precision, recall, and F1 score. SanDRA, which proposes a safe large-language-model-based decision making framework for automated vehicles using reachability analysis, providing provably safe and legally compliant driving actions. MACOG, which presents a multi-agent LLM-based architecture for infrastructure-as-code generation, producing syntactically valid, policy-compliant, and semantically coherent Terraform configurations.