The field of geospatial and environmental machine learning is rapidly advancing, with a focus on developing more robust and generalizable models for real-world applications. Recent research has emphasized the importance of benchmarking and evaluating models on diverse, high-impact tasks and domains. This has led to the creation of new datasets and benchmarks that capture the heterogeneity and scale of real-world data, such as those found in remote sensing and seismic interpretation. Notable papers include: ExEBench, which introduces a benchmark for extreme earth events and promotes the development of novel ML methods for disaster management. FedRS-Bench, which provides a realistic federated dataset and benchmark for remote sensing, enabling collaborative model training across decentralized data sources. A Large-scale Benchmark on Geological Fault Delineation Models, which systematically assesses pretraining, fine-tuning, and joint training strategies under varying degrees of domain shift.