Advancements in Distributed Computing and Scheduling

The field of distributed computing is moving towards decentralized and dynamic approaches to manage complex workloads and geographically dispersed resources. Researchers are exploring innovative solutions to optimize compute placement, job scheduling, and resource allocation, with a focus on network-aware and machine learning-based techniques. These advancements aim to improve the efficiency, scalability, and performance of distributed systems, enabling them to handle increasingly complex and data-intensive applications. Notable papers in this area include:

  • LIDC, which introduces a decentralized control plane for compute placement using semantic names, allowing for location-independent job execution and dynamic resource allocation.
  • Learning to Schedule, which presents a supervised learning framework for network-aware job scheduling, achieving higher accuracy in node selection compared to traditional schedulers.
  • Machine Learning and CPU Scheduling Co-Optimization, which proposes a co-optimization algorithm for distributed machine learning and CPU scheduling, improving cost optimality by over 50%.
  • Outperforming Multiserver SRPT at All Loads, which introduces a new scheduling policy that provably achieves lower mean response time than existing policies across all loads and job size distributions.

Sources

LIDC: A Location Independent Multi-Cluster Computing Framework for Data Intensive Science

Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive Workloads

Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers

Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges

Outperforming Multiserver SRPT at All Loads

Built with on top of