Advancements in Video Processing and Generation

The field of video processing and generation is rapidly advancing with the development of new models and techniques. A key direction in the field is the use of diffusion-based models, which have shown impressive results in tasks such as video super-resolution and video generation. These models are able to effectively capture the complex patterns and structures present in video data, allowing for high-quality outputs. Another significant trend is the use of transformer-based architectures, which have proven to be highly effective in modeling the temporal and spatial relationships present in video data. Noteworthy papers in this regard include OutDreamer, which introduces a novel video outpainting framework, and VSRM, which proposes a robust Mamba-based framework for video super-resolution. Additionally, papers such as STR-Match, MoMa, and SIEDD have made significant contributions to the field, advancing the state-of-the-art in video editing, video recognition, and video compression.

Sources

OutDreamer: Video Outpainting with a Diffusion Transformer

VSRM: A Robust Mamba-Based Framework for Video Super-Resolution

STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing

MoMa: Modulating Mamba for Adapting Image Foundation Models to Video Recognition

SIEDD: Shared-Implicit Encoder with Discrete Decoders

TurboVSR: Fantastic Video Upscalers and Where to Find Them

VMoBA: Mixture-of-Block Attention for Video Diffusion Models

How to Design and Train Your Implicit Neural Representation for Video Compression

FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion

Remote Rendering for Virtual Reality: performance comparison of multimedia frameworks and protocols

DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution

SketchColour: Channel Concat Guided DiT-based Sketch-to-Colour Pipeline for 2D Animation

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching