The field of traffic forecasting and scene understanding is moving towards more accurate and efficient models that can capture complex spatial-temporal dependencies. Recent developments have focused on leveraging self-attention mechanisms, machine learning approaches, and hybrid frameworks to improve predictive performance. Notably, the use of spatio-temporal information and vision-language models is becoming increasingly important for traffic scene understanding.
Some noteworthy papers in this area include: Capturing Complex Spatial-Temporal Dependencies in Traffic Forecasting: A Self-Attention Approach, which proposes a novel Spatial-Temporal Self-Attention Model for traffic forecasting. Weaver: Kronecker Product Approximations of Spatiotemporal Attention for Traffic Network Forecasting, which introduces a novel attention-based model that applies Kronecker product approximations to decompose spatiotemporal attention. Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding, which proposes a novel SpatioTemporal Enhanced Model based on CILP for traffic scene understanding. HyperD: Hybrid Periodicity Decoupling Framework for Traffic Forecasting, which proposes a novel framework that decouples traffic data into periodic and residual components.