The field of large language models (LLMs) is rapidly advancing, with a focus on improving their performance, interpretability, and reliability. Recent research has explored the use of LLMs in various applications, including natural language processing, Materials Science, and biomedical research. One notable direction is the development of methods to align LLMs with specialized knowledge, such as Balanced Fine-Tuning, which has shown promising results in biomedical tasks. Another area of research is the identification of error slices in LLMs, which is crucial for understanding and improving their performance. Active Slice Discovery has been proposed as a approach to reduce the amount of manual annotation required. Furthermore, there is a growing interest in evaluating the reliability and trustworthiness of LLMs, with studies investigating the use of paired bootstrap protocols and bias-correction methods. Noteworthy papers include 'Vector Arithmetic in Concept and Token Subspaces', which demonstrates the ability to perform coherent semantic structure in LLMs, and 'Toward Trustworthy Difficulty Assessments', which highlights the challenges of using LLMs as judges in programming and synthetic tasks. Additionally, 'CoreEval' proposes a contamination-resilient evaluation strategy for LLMs, and 'Auxiliary Metrics Help Decoding Skill Neurons in the Wild' introduces a method for isolating neurons that encode specific skills in LLMs.
Advances in Large Language Models
Sources
Toward Trustworthy Difficulty Assessments: Large Language Models as Judges in Programming and Synthetic Tasks
CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM Evaluation
Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification
LLMs-Powered Accurate Extraction, Querying and Intelligent Management of Literature derived 2D Materials Data