Efficient Models and Optimized Architectures in AI Research

The field of artificial intelligence is moving towards the development of more efficient models and optimized architectures. Researchers are focusing on improving the performance of large language models while reducing their computational costs. One of the key directions is the use of mixture-of-experts models, which have shown significant promise in multitask adaptability. Another area of research is the development of energy-efficient neural architecture search methods, which can identify architectures that minimize energy consumption while maintaining acceptable accuracy. The use of genetic algorithms and other optimization techniques is also becoming increasingly popular in the field. Noteworthy papers in this area include the BabyLM Challenge, which achieved significant improvements in language model training efficiency, and the Kernel-Level Energy-Efficient Neural Architecture Search, which proposed a method for identifying energy-efficient architectures. The EMAFusion framework also demonstrates a promising approach to self-optimizing large language model selection and integration. HELIOS is another notable work that proposes an adaptive model and early-exit selection for efficient large language model inference serving.

Sources

A Case Study on Evaluating Genetic Algorithms for Early Building Design Optimization: Comparison with Random and Grid Searches

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

Kernel-Level Energy-Efficient Neural Architecture Search for Tabular Dataset

Genetic Algorithm Design Exploration for On-Device Training on FPGAs

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

EMAFusion: A Self-Optimizing System for Seamless LLM Selection and Integration

HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models

Dense Backpropagation Improves Training for Sparse Mixture-of-Experts

Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?

Mixed Structural Choice Operator: Enhancing Technology Mapping with Heterogeneous Representations

Transferrable Surrogates in Expressive Neural Architecture Search Spaces

Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to a Set of Experts

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training