Advances in Efficient Computing for Large Language Models and Genomic Analysis

The field of computing is witnessing significant advancements in efficient processing and analysis of large language models and genomic data. Recent developments focus on optimizing computational resources, reducing memory walls, and improving parallelism to accelerate inference and processing times. Notably, innovative architectures and algorithms are being proposed to tackle the challenges posed by large language models, such as coherence-aware task graph modeling, hybrid and dynamic parallelism, and memory-compute efficient accelerators. Furthermore, research is exploring the potential of heterogeneous processing-in-memory, resistive memory-based neural differential equation solvers, and pervasive context management to enhance throughput and reduce latency. These advancements have the potential to significantly impact various applications, including autonomous driving, natural language processing, and genomic analysis.

Some noteworthy papers in this regard include: Optimizing the Variant Calling Pipeline Execution on Human Genomes Using GPU-Enabled Machines, which proposes a novel machine learning-based approach for optimizing variant calling pipeline execution. Coherence-Aware Task Graph Modeling for Realistic Application, which introduces a framework for constructing unified task graphs that reflect runtime behavior and incorporate cache coherence. Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference, which presents a hardware-software co-designed system for optimizing long-context LLM inference. HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing, which proposes an automatic hybrid parallel mapping algorithm and dynamic scheduling strategy for optimizing MoE parallel computation. MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness, which introduces a bit-grained compute-memory efficient algorithm-hardware co-design for accelerating LLM inference. UrgenGo: Urgency-Aware Transparent GPU Kernel Launching for Autonomous Driving, which presents a non-intrusive urgency-aware GPU scheduling system for autonomous driving applications. IsoSched: Preemptive Tile Cascaded Scheduling of Multi-DNN via Subgraph Isomorphism, which enables preemptive multi-DNN scheduling on tile spatial scheduling architecture. Efficient lattice field theory simulation using adaptive normalizing flow on a resistive memory-based neural differential equation solver, which integrates an adaptive normalizing flow model with a resistive memory-based neural differential equation solver for efficient lattice field theory simulation. HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference, which proposes a memory-centric heterogeneous processing-in-memory accelerator for LLM inference. Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management, which exploits common computational context in LLM applications for seamless context reuse on opportunistic resources. LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism, which proposes a computation/memory/communication co-designed non-von Neumann accelerator for LLM inference. The Energy-Efficient Hierarchical Neural Network with Fast FPGA-Based Incremental Learning, which combines hierarchical decomposition with FPGA-based direct equation solving and incremental learning for efficient and sustainable neural network processing.

Sources

Optimizing the Variant Calling Pipeline Execution on Human Genomes Using GPU-Enabled Machines

Coherence-Aware Task Graph Modeling for Realistic Application

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference

HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing

MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness

UrgenGo: Urgency-Aware Transparent GPU Kernel Launching for Autonomous Driving

IsoSched: Preemptive Tile Cascaded Scheduling of Multi-DNN via Subgraph Isomorphism

Efficient lattice field theory simulation using adaptive normalizing flow on a resistive memory-based neural differential equation solver

HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference

Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management

LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism

The Energy-Efficient Hierarchical Neural Network with Fast FPGA-Based Incremental Learning

Built with on top of