The field of biometric recognition and analysis is experiencing significant advancements with the integration of foundation models, particularly Vision-Language Models (VLMs) and Multi-modal Large Language Models (MLLMs). These models have demonstrated remarkable generalization capabilities across diverse tasks, including face verification, iris recognition, and presentation attack detection, with minimal or no supervision. The use of pre-trained models has shown great promise in achieving high accuracy in various biometric tasks, and researchers are exploring ways to improve their performance and adaptability to different datasets and scenarios. Noteworthy papers in this area include:
- A comprehensive benchmark evaluating the zero-shot and few-shot performance of state-of-the-art VLMs and MLLMs across six biometric tasks, achieving high accuracy in face verification and iris recognition without fine-tuning.
- A synthetic generator capable of producing diverse finger vein patterns, creating the largest available finger vein dataset, FingerVeinSyn-5M, which has achieved an average 53.91% performance gain across multiple benchmarks.