Advancements in GPU Performance Profiling and AI-Driven Computing Infrastructures

The field of computing is witnessing significant advancements in GPU performance profiling and AI-driven computing infrastructures. Researchers are developing innovative solutions to address the challenges of gathering comprehensive performance characteristics and value profiles from GPUs deployed in real-world scenarios. This includes the design of novel architectures that exploit fine-grained, time-varying behavioral diversity in single-threaded workloads, as well as the development of deep learning frameworks for high-fidelity, in-the-wild simulation on production hardware. Furthermore, there is a growing trend towards cloud-native solutions and Kubernetes platforms to ease the development of GPU-powered data analysis workflows and their scalability on heterogeneous distributed computing resources. Noteworthy papers in this area include the introduction of SAHM, a state-aware heterogeneous multicore architecture that targets performance gains by exploiting fine-grained behavioral diversity, and NeuroScalar, a deep learning framework for fast, accurate, and in-the-wild cycle-level performance prediction. Additionally, the development of the AI_INFN platform and the Nagare Media Engine are notable examples of innovative solutions for AI-driven computing infrastructures and multimedia workflow systems.

Sources

Privacy-Preserving Performance Profiling of In-The-Wild GPUs

The AI_INFN Platform: Artificial Intelligence Development in the Cloud

SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance

NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction

From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures

Nagare Media Engine: A System for Cloud- and Edge-Native Network-based Multimedia Workflows

AGOCS -- Accurate Google Cloud Simulator Framework

Leveraging AI modelling for FDS with Simvue: monitor and optimise for more sustainable simulations

CGSim: A Simulation Framework for Large Scale Distributed Computing Environment

Data Management System Analysis for Distributed Computing Workloads

Built with on top of