The field of cloud computing and machine learning is rapidly evolving, with a focus on improving efficiency, reducing costs, and enhancing performance. Recent developments have centered around optimizing resource allocation, autoscaling, and serverless computing. Researchers are exploring new approaches to predict runtime, allocate resources, and manage workloads, leading to significant improvements in latency, throughput, and energy consumption. Notably, the use of graph neural networks, hierarchical graph learning, and operator-level autoscaling are showing promising results.
Some noteworthy papers include: SERFLOW, which reduces cloud costs by over 23% while efficiently adapting to dynamic workloads. Panther, a cost-effective privacy-preserving framework for GNN training and inference services, reduces training and inference time by an average of 75.28% and 82.80%, respectively. Vortex, an SLO-first approach, achieves significantly lower and more stable latencies than existing ML serving platforms. From Models to Operators, an operator-level autoscaling framework, preserves SLOs with up to 40% fewer GPUs and 35% less energy. ScaleDL, a novel runtime prediction framework, enhances runtime prediction accuracy and generalizability, achieving 6× lower MRE and 5× lower RMSE compared to baseline models.