Optimizations and Advances in Matrix Multiplication and Workflow Scheduling

The field of high-performance computing is witnessing significant advancements in matrix multiplication and workflow scheduling. Researchers are exploring novel approaches to optimize matrix multiplication, such as leveraging low-rank approximations and ternary meta flip graphs, to achieve sub-quadratic complexity and improve performance. Additionally, there is a growing focus on developing scheduling frameworks that can efficiently manage GPU resources in multi-tenant environments and mitigate fragmentation. Other notable developments include the creation of software frameworks that can automatically produce statically scheduled and compiled code, and the introduction of NUMA-aware workflow execution runtime systems. These innovations have the potential to significantly enhance the efficiency and scalability of various computational workloads. Noteworthy papers include Low-Rank GEMM, which achieves up to 378 TFLOPS on matrices up to N=20480, and Fast Matrix Multiplication via Ternary Meta Flip Graphs, which discovers new matrix multiplication schemes in the ternary field.

Sources

Low-Rank GEMM: Efficient Matrix Multiplication via Low-Rank Approximation with FP8 Acceleration

An Online Fragmentation-Aware GPU Scheduler for Multi-Tenant MIG-based Clouds

Optimizations on Graph-Level for Domain Specific Computations in Julia and Application to QED

Enabling Scientific Workflow Scheduling Research in Non-Uniform Memory Access Architectures

Compilation of Generalized Matrix Chains with Symbolic Sizes

Fast Matrix Multiplication via Ternary Meta Flip Graphs

Sublinear Time Low-Rank Approximation of Hankel Matrices

Modeling the Effect of Data Redundancy on Speedup in MLFMA Near-Field Computation

Built with on top of