Advances in Cloud Computing and Machine Learning

The field of cloud computing and machine learning is rapidly evolving, with a focus on improving efficiency, reducing costs, and enhancing performance. Recent developments have centered around optimizing resource allocation, autoscaling, and serverless computing. Researchers are exploring new approaches to predict runtime, allocate resources, and manage workloads, leading to significant improvements in latency, throughput, and energy consumption. Notably, the use of graph neural networks, hierarchical graph learning, and operator-level autoscaling are showing promising results.

Some noteworthy papers include: SERFLOW, which reduces cloud costs by over 23% while efficiently adapting to dynamic workloads. Panther, a cost-effective privacy-preserving framework for GNN training and inference services, reduces training and inference time by an average of 75.28% and 82.80%, respectively. Vortex, an SLO-first approach, achieves significantly lower and more stable latencies than existing ML serving platforms. From Models to Operators, an operator-level autoscaling framework, preserves SLOs with up to 40% fewer GPUs and 35% less energy. ScaleDL, a novel runtime prediction framework, enhances runtime prediction accuracy and generalizability, achieving 6× lower MRE and 5× lower RMSE compared to baseline models.

Sources

Green Bin Packing

SERFLOW: A Cross-Service Cost Optimization Framework for SLO-Aware Dynamic ML Inference

Fix: externalizing network I/O in serverless computing

NOMAD - Navigating Optimal Model Application to Datastreams

WindMiL: Equivariant Graph Learning for Wind Loading Prediction

Panther: A Cost-Effective Privacy-Preserving Framework for GNN Training and Inference Services in Cloud Environments

Possible Futures for Cloud Cost Models

Learned Cost Model for Placement on Reconfigurable Dataflow Hardware

HGraphScale: Hierarchical Graph Learning for Autoscaling Microservice Applications in Container-based Cloud Computing

Roadrunner: Accelerating Data Delivery to WebAssembly-Based Serverless Functions

Vortex: Hosting ML Inference and Knowledge Retrieval Services With Tight Latency and Throughput Requirements

From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models

ScaleDL: Towards Scalable and Efficient Runtime Prediction for Distributed Deep Learning Workloads

Built with on top of