Advances in High-Performance Computing and Matrix Operations

The field of high-performance computing is witnessing significant developments, driven by the need for efficient matrix operations and improved memory management. Recent research has focused on optimizing matrix multiplication, sparse matrix operations, and memory allocation on heterogeneous architectures. Notably, the use of GPUs and specialized accelerators has led to substantial performance gains. Furthermore, innovative algorithms and data structures, such as hierarchical tiling and memory-aware architectures, are being explored to minimize redundant computations and reduce memory bandwidth demands. These advancements have far-reaching implications for various applications, including scientific computing, machine learning, and computational biology. Noteworthy papers include: Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU, which demonstrates the profound impact of many-core GPU architectures on accelerating data-parallel workloads. A Fast Parallel Median Filtering Algorithm Using Hierarchical Tiling, which introduces a novel algorithm that achieves unprecedented per-pixel complexities for sorting-based methods. RIMMS: Runtime Integrated Memory Management System for Heterogeneous Computing, which presents a lightweight, runtime-managed, hardware-agnostic memory abstraction layer that decouples application development from low-level memory operations.

Sources

CUTHERMO: Understanding GPU Memory Inefficiencies with Heat Map Profiling

A Formalization of Elementary Linear Algebra: Part I

A Formalization of Elementary Linear Algebra: Part II

Query Efficient Structured Matrix Learning

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

A Fast Parallel Median Filtering Algorithm Using Hierarchical Tiling

Racing to Idle: Energy Efficiency of Matrix Multiplication on Heterogeneous CPU and GPU Architectures

Subset selection for matrices in spectral norm

RIMMS: Runtime Integrated Memory Management System for Heterogeneous Computing

On rank-2 Nonnegative Matrix Factorizations and their variants

Exascale Implicit Kinetic Plasma Simulations on El~Capitan for Solving the Micro-Macro Coupling in Magnetospheric Physics

Smith normal forms of bivariate polynomial matrices

Improving SpGEMM Performance Through Matrix Reordering and Cluster-wise Computation

The Performance of Low-Synchronization Variants of Reorthogonalized Block Classical Gram--Schmidt

A Customized Memory-aware Architecture for Biological Sequence Alignment

Minimizing CGYRO HPC Communication Costs in Ensembles with XGYRO by Sharing the Collisional Constant Tensor Structure

Leveraging Caliper and Benchpark to Analyze MPI Communication Patterns: Insights from AMG2023, Kripke, and Laghos

H2SGEMM: Emulating FP32 GEMM on Ascend NPUs using FP16 Units with Precision Recovery and Cache-Aware Optimization

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

Regularization of Inverse Problems by Filtered Diagonal Frame Decomposition under general source