Advances in Visual Recognition and 3D Mapping

The field of visual recognition and 3D mapping is rapidly evolving, with a focus on developing more efficient and accurate models. Recent research has explored the use of multi-scale features, capsule networks, and transformer-based architectures to improve performance in various tasks such as image classification, object detection, and 3D reconstruction. Notably, the integration of multi-scale features and attention mechanisms has shown promising results in capturing complex patterns and relationships in data. Additionally, the application of 3D mapping techniques to dynamic environments and indoor spaces has gained significant attention, with advances in drone-based scanning and human-AI collaborative annotation. Overall, the field is moving towards more robust and scalable models that can handle diverse and complex data.

Noteworthy papers include: MSPCaps, which proposes a novel capsule network architecture that integrates multi-scale feature learning and efficient capsule routing, achieving remarkable scalability and superior robustness. MSMVD, which exploits multi-scale image features to generate BEV features for multi-view pedestrian detection, improving detection performance and outperforming previous methods. E-ConvNeXt, which significantly reduces the parameter scale and network complexity of ConvNeXt while maintaining high accuracy performance, demonstrating a superior accuracy-efficiency balance.

Sources

MSPCaps: A Multi-Scale Patchify Capsule Network with Cross-Agreement Routing for Visual Recognition

A Lightweight Convolution and Vision Transformer integrated model with Multi-scale Self-attention Mechanism

M3DMap: Object-aware Multimodal 3D Mapping for Dynamic Environments

VROOM - Visual Reconstruction over Onboard Multiview

MTNet: Learning modality-aware representation with transformer for RGBT tracking

Optimizing Multi-Modal Trackers via Sensitivity-aware Regularized Tuning

Adaptive Visual Navigation Assistant in 3D RPGs

Enhancing compact convolutional transformers with super attention

FlyMeThrough: Human-AI Collaborative 3D Indoor Mapping with Commodity Drones

MSMVD: Exploiting Multi-scale Image Features via Multi-scale BEV Features for Multi-view Pedestrian Detection

E-ConvNeXt: A Lightweight and Efficient ConvNeXt Variant with Cross-Stage Partial Connections

Multi-View 3D Point Tracking

Built with on top of