The field of natural language processing is moving towards more inclusive and diverse language models, with a focus on developing models that can understand and generate text in multiple languages and dialects. Recent research has highlighted the importance of domain-specific adaptation and the need for large, high-quality datasets to train and evaluate language models. The development of new benchmarks and evaluation frameworks is also a key area of focus, enabling researchers to assess the performance of language models in a more comprehensive and nuanced way. Notable papers in this area include: DialectalArabicMMLU, which introduces a new benchmark for evaluating the performance of large language models across Arabic dialects. HPLT 3.0, which presents a very large-scale multilingual resource for language model pre-training and evaluation. AraFinNews, which investigates the impact of domain specificity on abstractive summarisation of Arabic financial texts using large language models. PLLuM, which presents a family of Polish large language models tailored specifically for the Polish language.