The field of natural language processing is moving towards a more nuanced understanding of language, with a focus on evaluating and improving the performance of large language models (LLMs) in multilingual settings. Recent research has highlighted the need for more comprehensive and diverse evaluation benchmarks, as well as the importance of considering the social and cultural context of language use. The development of new metrics and frameworks, such as the Single Token Retention Rate (STRR) and the LongQAEval framework, has enabled more accurate and equitable assessments of LLMs across languages and domains. Furthermore, studies have emphasized the need to address the digital epistemic injustice faced by marginalized languages and to develop more inclusive and linguistically informed tokenization strategies. Noteworthy papers include: NarraBench, which presents a comprehensive framework for narrative benchmarking and highlights the need for new evaluations covering overlooked aspects of narrative understanding. Invisible Languages of the LLM Universe, which proposes a critical framework for understanding linguistic inequality in AI systems and demonstrates the structural exclusion of marginalized languages. Tokenization Disparities as Infrastructure Bias, which conducts a large-scale cross-linguistic evaluation of tokenization efficiency and reveals substantial disparities in computational costs and effective context utilization across languages.