Advances in High-Performance Computing Architectures and Algorithms

The field of high-performance computing is experiencing significant advancements in both architectures and algorithms. Researchers are exploring innovative ways to optimize compute power, memory, and communication efficiency. One key direction is the development of specialized architectures, such as GPUs and AI chips, which are being designed to meet the increasing demands of deep learning and scientific simulations. Another area of focus is the improvement of algorithms, including matrix-accelerated stencil computation, Markov Chain Monte Carlo acceleration, and mixed-precision algorithms. These advancements are enabling faster and more efficient processing of complex workloads, with applications in Fields such as physics, materials science, and biology. Noteworthy papers include OpenGCRAM, which introduces an open-source gain cell compiler for optimizing memory subsystems, and MC$^2$A, which presents an algorithm-hardware co-design framework for efficient Markov Chain Monte Carlo acceleration. Elk is also a notable paper, which develops a DL compiler framework to maximize the efficiency of inter-core connected AI chips.

Sources

Dissecting the NVIDIA Blackwell Architecture with Microbenchmarks

OpenGCRAM: An Open-Source Gain Cell Compiler Enabling Design-Space Exploration for AI Workloads

MMStencil: Optimizing High-order Stencils on Multicore CPU using Matrix Unit

The Multiple Time-Stepping Method for 3-Body Interactions in High Performance Molecular Dynamics Simulations

Cyclic Data Streaming on GPUs for Short Range Stencils Applied to Molecular Dynamics

Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques

Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine

MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration

Built with on top of