Advances in Generative Models for Image and Video Synthesis

The field of generative models is rapidly advancing, with a focus on improving the quality and coherence of synthesized images and videos. Recent developments have led to the creation of more sophisticated models that can handle complex tasks such as multi-view image generation, panoramic image stitching, and realistic virtual try-on applications. These models are able to learn from large datasets and generate high-quality outputs that are often indistinguishable from real images. Notably, the use of diffusion-based models has shown significant promise in achieving state-of-the-art results in various applications. Some of the most innovative works in this area include: LoomNet, which generates consistent multi-view images from a single image, outperforming state-of-the-art methods on both image quality and reconstruction metrics. Generative HMC, which leverages large unpaired head-mounted camera captures to directly generate high-quality synthetic images given any conditioning avatar state, properly disentangling the input conditioning signal from facial appearance. Stable-Hair v2, which proposes a novel diffusion-based multi-view hair transfer framework, achieving seamless and consistent results across views and significantly outperforming existing methods.

Advances in Generative Models for Image and Video Synthesis

Sources