The field of GPU research is moving towards increasing efficiency and speed through innovative implementations and optimizations. Recent developments have focused on harnessing the power of GPU tensor cores and warp-level features to accelerate various computations, including beamforming and wavelet tree operations. These advances have led to significant improvements in performance, with some implementations achieving speedups of up to 4 times and energy efficiencies of over 10 TeraOps/J. Additionally, the integration of CPU and GPU cores in single packages has enabled new comparisons of CPU and GPU compute tradeoffs, revealing the benefits of simultaneous multithreading and brute force approaches for memory-bound algorithms. Noteworthy papers include:
- The Tensor-Core Beamformer, which introduces a generic, optimized beamformer library that achieves over 600 TeraOps/s on an AMD MI300X GPU and breaks the 3 PetaOps/s barrier on an NVIDIA A100 GPU.
- Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU, which explores hardware and software implementations of warp-level features in RISC-V GPUs and achieves up to 4 times geomean IPC speedup in microbenchmarks.