Advancements in Resource Sharing and Isolation for High-Performance Computing

The field of high-performance computing is moving towards a more efficient and sustainable use of resources, with a focus on sharing and mutualization of infrastructure. This shift is driven by the increasing environmental impact of ICT and the need to reduce energy consumption and waste. Researchers are exploring new approaches to time-sharing and resource allocation, such as autonomous GPU sharing and container-based task dispatching, to improve utilization and reduce costs. Additionally, there is a growing emphasis on isolation and security, with novel methods being proposed to measure and understand performance interference and isolation at the system software layer. Notable papers in this area include: GPUnion, which presents a campus-scale GPU sharing platform that enables voluntary participation while preserving full provider autonomy. The Case for Time-Shared Computing Resources, which advocates for managing fewer physical resources by improving resource sharing between tenants. Locked In, Leaked Out, which proposes a novel way to understand and measure performance interference and isolation at the system software layer. Using Containers to Speed Up Development, to Run Integration Tests and to Teach About Distributed Systems, which describes the use of containers to ease development and testing of distributed systems. GlideinBenchmark, which presents a new Web application for benchmarking resources and optimizing provisioning. Towards Experiment Execution in Support of Community Benchmark Workflows for HPC, which proposes workflow templates as a solution to demonstrating compute resource capability with limited benchmarks.

Sources

GPUnion: Autonomous GPU Sharing on Campus

The Case for Time-Shared Computing Resources

Locked In, Leaked Out: Measuring Isolation via Kernel Locks

Using Containers to Speed Up Development, to Run Integration Tests and to Teach About Distributed Systems

GlideinBenchmark: collecting resource information to optimize provisioning

Towards Experiment Execution in Support of Community Benchmark Workflows for HPC

Built with on top of