Optimizing Computational Performance in Emerging Architectures

The field is moving towards optimizing computational performance in emerging architectures, with a focus on mixed-precision arithmetic, sparse neural networks, and compiler transformations. Researchers are exploring novel strategies to improve the efficiency of matrix multiplication, a fundamental kernel in many deep learning and scientific computing tasks, by leveraging hardware advancements such as mixed-precision vector units and matrix engines. Noteworthy papers include:

  • A Performance Model for Warp Specialization Kernels, which presents a performance model for optimizing GPU-accelerated applications.
  • Dynamic Sparse Training of Diagonally Sparse Networks, which proposes a novel structured sparse-to-sparse method that performs at par with unstructured sparsity while benefiting from tangible computational gains.
  • The Cambrian Explosion of Mixed-Precision Matrix Multiplication for Quantized Deep Learning Inference, which revisits traditional high-performance matrix multiplication and describes strategies for adapting it to mixed-precision integer arithmetic.
  • A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs, which proposes a new compiler transformation that accelerates sparse matrix-matrix multiplication on GPU devices.

Sources

A Performance Model for Warp Specialization Kernels

Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic

Dynamic Sparse Training of Diagonally Sparse Networks

The Cambrian Explosion of Mixed-Precision Matrix Multiplication for Quantized Deep Learning Inference

A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs

Built with on top of