Advancements in AI and HPC Systems

The field of Artificial Intelligence (AI) and High-Performance Computing (HPC) is rapidly evolving, with a focus on improving efficiency, performance, and scalability. Recent developments have centered around optimizing memory access, reducing data movement, and increasing computational power. Notably, innovations in processing-in-memory (PIM) architectures, heterogeneous systems, and advanced networking protocols are transforming the landscape. These advancements aim to address the growing demands of AI workloads, such as large language models and graph analytics, which require significant computational resources and memory bandwidth. Furthermore, research in reconfigurable architectures, 3D-stacked systems, and thermally-aware scheduling is pushing the boundaries of what is possible in AI and HPC. Overall, the field is moving towards more efficient, scalable, and sustainable solutions that can support the increasing complexity of AI and HPC applications. Noteworthy papers include VectorCDC, which accelerates data chunking in deduplication systems using vector instructions, and TLV-HGNN, which proposes a reconfigurable hardware accelerator for efficient HGNN inference. Additionally, THERMOS introduces a thermally-aware multi-objective scheduling framework for AI workloads on heterogeneous multi-chiplet PIM architectures.

Sources

Accelerating Data Chunking in Deduplication Systems using Vector Instructions

Panel-Scale Reconfigurable Photonic Interconnects for Scalable AI Computation

Slice or the Whole Pie? Utility Control for AI Models

Physical Design Exploration of a Wire-Friendly Domain-Specific Processor for Angstrom-Era Nodes

The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries

An Experimental Exploration of In-Memory Computing for Multi-Layer Perceptrons

Tasa: Thermal-aware 3D-Stacked Architecture Design with Bandwidth Sharing for LLM Inference

Coordinated Power Management on Heterogeneous Systems

Towards Lock Modularization for Heterogeneous Environments

TLV-HGNN: Thinking Like a Vertex for Memory-efficient HGNN Inference

Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving

XDMA: A Distributed, Extensible DMA Architecture for Layout-Flexible Data Movements in Heterogeneous Multi-Accelerator SoCs

Towards Efficient and Practical GPU Multitasking in the Era of LLM

Architecting Long-Context LLM Acceleration with Packing-Prefetch Scheduler and Ultra-Large Capacity On-Chip Memories

Vector-Centric Machine Learning Systems: A Cross-Stack Approach

JSPIM: A Skew-Aware PIM Accelerator for High-Performance Databases Join and Select Operations

OISMA: On-the-fly In-memory Stochastic Multiplication Architecture for Matrix-Multiplication Workloads

Ultra Ethernet's Design Principles and Architectural Innovations

A Limits Study of Memory-side Tiering Telemetry

Design and Simulation of 6T SRAM Array

Re-thinking Memory-Bound Limitations in CGRAs

Energy-efficient PON-based Backhaul Connectivity for a VLC-enabled Indoor Fog Computing Environment

Dalek: An Unconventional and Energy-Aware Heterogeneous Cluster

THERMOS: Thermally-Aware Multi-Objective Scheduling of AI Workloads on Heterogeneous Multi-Chiplet PIM Architectures