Advances in Object-Centric Representation and Editing

The field of computer vision and generative modeling is moving towards more fine-grained and controllable representations of scenes and objects. Recent developments have focused on adapting powerful pre-trained models for object-centric synthesis, enabling more precise editing and manipulation of images and videos. A key challenge in this area is balancing global scene coherence with disentangled object control, and several approaches have been proposed to address this issue, including the use of slot-based conditioning and attention mechanisms. These innovations have led to state-of-the-art results in object discovery, segmentation, compositional editing, and controllable image and video generation. Notable papers in this area include RefAM, which introduces a simple training-free grounding framework that combines cross-attention maps and attention redistribution, and CrimEdit, which proposes a controllable editing framework for object removal, insertion, and movement. Additionally, the work on Learning Object-Centric Representations Based on Slots has established a general and scalable approach to object-centric generative modeling for images and videos.

Sources

RefAM: Attention Magnets for Zero-Shot Referral Segmentation

CrimEdit: Controllable Editing for Counterfactual Object Removal, Insertion, and Movement

From Unstable to Playable: Stabilizing Angry Birds Levels via Object Segmentation

Learning Object-Centric Representations Based on Slots in Real World Scenarios

Built with on top of