The field of distributed computing is moving towards decentralized and dynamic approaches to manage complex workloads and geographically dispersed resources. Researchers are exploring innovative solutions to optimize compute placement, job scheduling, and resource allocation, with a focus on network-aware and machine learning-based techniques. These advancements aim to improve the efficiency, scalability, and performance of distributed systems, enabling them to handle increasingly complex and data-intensive applications. Notable papers in this area include:
- LIDC, which introduces a decentralized control plane for compute placement using semantic names, allowing for location-independent job execution and dynamic resource allocation.
- Learning to Schedule, which presents a supervised learning framework for network-aware job scheduling, achieving higher accuracy in node selection compared to traditional schedulers.
- Machine Learning and CPU Scheduling Co-Optimization, which proposes a co-optimization algorithm for distributed machine learning and CPU scheduling, improving cost optimality by over 50%.
- Outperforming Multiserver SRPT at All Loads, which introduces a new scheduling policy that provably achieves lower mean response time than existing policies across all loads and job size distributions.