Advances in Image and Video Generation

The field of image and video generation is rapidly advancing with the development of new architectures and techniques. One of the key trends is the use of multi-scale and multi-frequency approaches to improve the quality and realism of generated images and videos. This includes the use of pyramid inputs, adaptive spatial-frequency learning units, and global feature fusion blocks to enhance the features at different scales. Another area of focus is the improvement of diffusion models, which have shown exceptional performance in image synthesis but are computationally intensive. Techniques such as progressive quantization, calibration-assisted distillation, and knowledge distillation are being explored to improve the efficiency and effectiveness of these models. Additionally, there is a growing interest in using reinforcement learning and vision-language models to improve the quality and trustworthiness of generated images and videos. Noteworthy papers in this area include Hunyuan3D 2.5, which generates high-fidelity 3D assets with ultimate details, and HiWave, which achieves training-free high-resolution image generation via wavelet-based diffusion sampling. PQCAD-DM and Diffusion Transformer-to-Mamba Distillation are also notable for their contributions to efficient and high-quality image generation.

Sources

Learning Multi-scale Spatial-frequency Features for Image Denoising

Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details

PQCAD-DM: Progressive Quantization and Calibration-Assisted Distillation for Extremely Efficient Diffusion Model

RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought

Unsupervised Image Super-Resolution Reconstruction Based on Real-World Degradation Patterns

Diffusion Transformer-to-Mamba Distillation for High-Resolution Image Generation

Deblurring in the Wild: A Real-World Dataset from Smartphone High-Speed Videos

SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution

Improving Progressive Generation with Decomposable Flow Matching

Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack

Distillation-Enabled Knowledge Alignment for Generative Semantic Communications in AIGC Provisioning Tasks

Progressive Alignment Degradation Learning for Pansharpening

HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models