The field is moving towards optimizing computational performance in emerging architectures, with a focus on mixed-precision arithmetic, sparse neural networks, and compiler transformations. Researchers are exploring novel strategies to improve the efficiency of matrix multiplication, a fundamental kernel in many deep learning and scientific computing tasks, by leveraging hardware advancements such as mixed-precision vector units and matrix engines. Noteworthy papers include:
- A Performance Model for Warp Specialization Kernels, which presents a performance model for optimizing GPU-accelerated applications.
- Dynamic Sparse Training of Diagonally Sparse Networks, which proposes a novel structured sparse-to-sparse method that performs at par with unstructured sparsity while benefiting from tangible computational gains.
- The Cambrian Explosion of Mixed-Precision Matrix Multiplication for Quantized Deep Learning Inference, which revisits traditional high-performance matrix multiplication and describes strategies for adapting it to mixed-precision integer arithmetic.
- A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs, which proposes a new compiler transformation that accelerates sparse matrix-matrix multiplication on GPU devices.