Advances in Multimodal Image Processing

The field of computer vision is witnessing significant advancements in multimodal image processing, with a focus on improving robustness and accuracy in challenging scenarios. Researchers are exploring the integration of different modalities, such as visible, infrared, and event-based images, to enhance image quality and detection performance. Innovative methods, including weight-space ensembling, adaptive gamma correction, and multimodal transformers, are being proposed to address issues like modality gaps, misalignments, and brightness mismatches. These developments have the potential to improve various applications, including low-light image enhancement, object detection, and autonomous driving. Noteworthy papers include WiSE-OD, which improves cross-modality and corruption robustness in infrared object detection, and ModalFormer, which achieves state-of-the-art performance in low-light image enhancement using a multimodal transformer. Other notable works, such as MoCTEFuse and LSFDNet, demonstrate superior fusion performance and detection accuracy in multi-level infrared and visible image fusion and ship detection tasks, respectively.

Sources

WiSE-OD: Benchmarking Robustness in Infrared Object Detection

Tuning adaptive gamma correction (TAGC) for enhancing images in low ligh

UniCT Depth: Event-Image Fusion Based Monocular Depth Estimation with Convolution-Compensated ViT Dual SA Block

Wavelet-guided Misalignment-aware Network for Visible-Infrared Object Detection

GT-Mean Loss: A Simple Yet Effective Solution for Brightness Mismatch in Low-Light Image Enhancement

MoCTEFuse: Illumination-Gated Mixture of Chiral Transformer Experts for Multi-Level Infrared and Visible Image Fusion

ModalFormer: Multimodal Transformer for Low-Light Image Enhancement

LSFDNet: A Single-Stage Fusion and Detection Network for Ships Using SWIR and LWIR

Event-Based De-Snowing for Autonomous Driving

Built with on top of