Image Quality Assessment and Vision Transformers

The field of image quality assessment is moving towards more explainable and detailed analysis methods. Recent studies have focused on developing new datasets and frameworks that can capture rich low-level visual features and correlate them with distortion patterns. Vision Transformers (ViTs) have emerged as a promising alternative to traditional convolutional neural networks (CNNs) for image classification tasks, demonstrating superior performance in kidney stone image classification and face image quality assessment. The use of ViTs has also been explored in facial emotion recognition, although traditional deep learning models have shown better performance in this area. Noteworthy papers include: ViDA-UGC, which establishes a large-scale Visual Distortion Assessment Instruction Tuning Dataset for UGC images. ViT-FIQA, which proposes a novel approach for assessing face image quality using Vision Transformers. HiRQA, which introduces a self-supervised, opinion-unaware framework for no-reference image quality assessment.

Sources

ViDA-UGC: Detailed Image Quality Analysis via Visual Distortion Assessment for UGC Images

Vision Transformers for Kidney Stone Image Classification: A Comparative Study with CNNs

Evaluating Open-Source Vision Language Models for Facial Emotion Recognition against Traditional Deep Learning Models

ViT-FIQA: Assessing Face Image Quality using Vision Transformers

HiRQA: Hierarchical Ranking and Quality Alignment for Opinion-Unaware Image Quality Assessment

Built with on top of