Advances in GPU Programming and Multi-GPU Communication

The field of GPU programming and multi-GPU communication is moving towards more modular, efficient, and scalable solutions. Researchers are exploring new programming models and languages that can balance the need for low-level control with the need for modularity and safety. Additionally, there is a growing focus on improving multi-GPU communication, with a emphasis on reducing latency and increasing bandwidth utilization. Noteworthy papers in this area include: Modular GPU Programming with Typed Perspectives, which introduces a new GPU language that restores modularity while giving programmers low-level control over collective operations. High-Performance N-Queens Solver on GPU, which proposes an innovative parallel computing method that achieves over 10x speedup on identical hardware configurations. Iris, which provides a multi-GPU communication library that eliminates the trade-off between performance and programmability. ParallelKittens, which presents a minimal CUDA framework that simplifies the development of overlapped multi-GPU kernels. Multi-GPU Quantum Circuit Simulation and the Impact of Network Performance, which highlights the importance of interconnect performance in multi-GPU simulations. GPU-Initiated Networking for NCCL, which introduces a device-initiated communication model that eliminates CPU coordination overhead.

Sources

Modular GPU Programming with Typed Perspectives

High-Performance N-Queens Solver on GPU: Iterative DFS with Zero Bank Conflicts

Iris: First-Class Multi-GPU Programming Experience in Triton

ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels

Multi-GPU Quantum Circuit Simulation and the Impact of Network Performance

GPU-Initiated Networking for NCCL

Built with on top of