Advances in Multimodal Representation and Event-Driven Vision

The field of computer vision and multimodal learning is moving towards more robust and efficient representations of complex data. Recent research has focused on leveraging auxiliary information, such as visual attributes and temporal context, to improve retrieval performance and bridge the semantic gap between different modalities. Another significant direction is the development of event-driven vision methods, which can efficiently process and represent asynchronous event streams from event cameras. These methods have shown great promise in various applications, including person re-identification, object recognition, and visible-infrared person re-identification. Notable papers in this area include S3CE-Net, which proposes a spike-guided spatiotemporal semantic coupling and expansion network for long-sequence event-based person re-identification, and BiMa, which introduces a novel framework to mitigate biases in text-video retrieval via scene element guidance. Additionally, the proposed dataset FRED provides a valuable resource for researchers to explore drone detection, tracking, and trajectory forecasting using event cameras.

Sources

Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review

S3CE-Net: Spike-guided Spatiotemporal Semantic Coupling and Expansion Network for Long Sequence Event Re-Identification

SA-Person: Text-Based Person Retrieval with Scene-aware Re-ranking

Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning

Video-Level Language-Driven Video-Based Visible-Infrared Person Re-Identification

Probabilistic Online Event Downsampling

BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance

ROSA: Addressing text understanding challenges in photographs via ROtated SAmpling

CoLa: Chinese Character Decomposition with Compositional Latent Components

Learning from Noise: Enhancing DNNs for Event-Based Vision through Controlled Noise Injection

EV-Flying: an Event-based Dataset for In-The-Wild Recognition of Flying Objects

Person Re-Identification System at Semantic Level based on Pedestrian Attributes Ontology

Spike-TBR: a Noise Resilient Neuromorphic Event Representation

Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts

FRED: The Florence RGB-Event Drone Dataset

Built with on top of