Advancements in HPC and GPU-Accelerated Computing

The field of high-performance computing (HPC) and GPU-accelerated computing is rapidly advancing, driven by the increasing demand for efficient data movements and communication within HPC applications. Recent research has focused on optimizing inter-APU communication, unified physical memory, and collective communication operations. Notably, innovations in programming interfaces, allocators, and data movement have led to significant performance improvements. Additionally, novel approaches to graph neural network training and graph condensation have shown promising results in reducing communication overhead and improving scalability. Overall, the field is moving towards more efficient, scalable, and practical solutions for HPC and GPU-accelerated computing. Noteworthy papers include: Inter-APU Communication on AMD MI300A Systems via Infinity Fabric, which evaluates direct memory access and collective multi-APU communication. Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs, which characterizes the UPM architecture and proposes porting strategies for applications. Persistent and Partitioned MPI for Stencil Communication, which presents performance optimizations for stencil communication. Optimizing Allreduce Operations for Heterogeneous Architectures with Multiple Processes per GPU, which yields speedup of up to 2.45x for large MPI all-reduces. CaPGNN, which reduces communication costs by up to 96% and accelerates GNN training by up to 12.7 times. Multi-view Graph Condensation via Tensor Decomposition, which effectively reduces graph size while preserving GNN performance.

Sources

Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive

Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs

Persistent and Partitioned MPI for Stencil Communication

Optimizing Allreduce Operations for Heterogeneous Architectures with Multiple Processes per GPU

CaPGNN: Optimizing Parallel Graph Neural Network Training with Joint Caching and Resource-Aware Graph Partitioning

Multi-view Graph Condensation via Tensor Decomposition

Built with on top of