Efficient Model Optimization and Evaluation

The field of large language models is moving towards more efficient and scalable solutions. Researchers are exploring innovative methods to reduce computational overhead and improve model performance. One key direction is the development of pruning techniques that can effectively reduce model size while maintaining accuracy. Another important area of research is the creation of evaluation frameworks that can accurately assess the effectiveness of these methods. Notable papers in this area include: HoloV, which proposes a holistic visual token pruning framework for efficient inference. STRUPRUNE, which achieves structured pruning with reduced memory cost. UniPruning, which provides a unified post-training pruning framework that combines local and global methods. BMC-LongCLIP, which extends context capacity in biomedical vision-language models. POME, which enhances fine-tuned language models using a muon-style projection. TRIM, which introduces a token-centric framework for data-efficient instruction tuning. VTC-Bench, which provides an evaluation framework for visual token compression methods.

Sources

Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention

StructPrune: Structured Global Pruning asymptotics with $\mathcal{O}(\sqrt{N})$ GPU Memory

UniPruning: Unifying Local Metric and Global Feedback for Scalable Sparse LLMs

No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models

POME: Post Optimization Model Edit via Muon-style Projection

TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods

Built with on top of