The field of synthetic image generation and domain adaptation is rapidly advancing, with a focus on improving the efficiency and effectiveness of models in real-world applications. Recent developments have highlighted the potential of large generative models, such as text-to-image diffusion models, to tackle challenging tasks like few-shot class-incremental learning and zero-shot domain adaptation. These models have been shown to leverage their generative capabilities, multi-scale representation, and representational flexibility to achieve state-of-the-art performance in various benchmarks. Notably, innovative approaches have been proposed to unlock the zero-shot potential of diffusion transformers, enabling consistent subject synthesis across diverse contexts without requiring additional training. Furthermore, techniques for refining synthetic image-caption datasets have been developed, leading to significant improvements in zero-shot image captioning performance. Overall, the field is moving towards more efficient, flexible, and effective models that can adapt to new domains and tasks with minimal supervision. Noteworthy papers include:
- Diffusion-FSCIL, which proposes a novel approach to few-shot class-incremental learning using a frozen text-to-image diffusion model as a backbone.
- FreeCus, a training-free framework that activates diffusion transformers' capabilities for authentic subject-driven synthesis.
- SynC, a framework for refining synthetic image-caption datasets for zero-shot image captioning.
- SIDA, a novel zero-shot domain adaptation method that leverages synthetic images to capture complex real-world variations.