The field of image quality assessment is moving towards more explainable and detailed analysis methods. Recent studies have focused on developing new datasets and frameworks that can capture rich low-level visual features and correlate them with distortion patterns. Vision Transformers (ViTs) have emerged as a promising alternative to traditional convolutional neural networks (CNNs) for image classification tasks, demonstrating superior performance in kidney stone image classification and face image quality assessment. The use of ViTs has also been explored in facial emotion recognition, although traditional deep learning models have shown better performance in this area. Noteworthy papers include: ViDA-UGC, which establishes a large-scale Visual Distortion Assessment Instruction Tuning Dataset for UGC images. ViT-FIQA, which proposes a novel approach for assessing face image quality using Vision Transformers. HiRQA, which introduces a self-supervised, opinion-unaware framework for no-reference image quality assessment.