Advances in Image Super-Resolution and Text Recovery

The field of image super-resolution is moving towards a more nuanced approach, where the focus is not only on enhancing image quality but also on preserving textual readability. This is evident in the development of novel frameworks that decouple glyph restoration from image enhancement, allowing for both high fidelity and visual consistency. Another area of innovation is the introduction of physically-grounded media interactions, which enables the estimation of medium properties such as water depth and improves reconstruction fidelity. Furthermore, there is a growing emphasis on achieving both high-quality image super-resolution and high-fidelity text recovery, with approaches that utilize vision-language-guided diffusion models and targeted text restoration. Notable papers include: I2-NeRF, which proposes a novel neural radiance field framework that enhances isometric and isotropic metric perception under media degradation. TIGER, which introduces a two-stage framework that breaks the trade-off between image quality and textual readability. SRSR, which proposes a novel spatially re-focused super-resolution framework that consists of two core components: Spatially Re-focused Cross-Attention and Spatially Targeted Classifier-Free Guidance. GLYPH-SR, which presents a vision-language-guided diffusion framework that aims to achieve both high-quality image super-resolution and high-fidelity text recovery.

Sources

Restore Text First, Enhance Image Later: Two-Stage Scene Text Image Super-Resolution with Glyph Structure Guidance

I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions

SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning

GLYPH-SR: Can We Achieve Both High-Quality Image Super-Resolution and High-Fidelity Text Recovery via VLM-guided Latent Diffusion Model?

Built with on top of