The field of computer vision is moving towards incorporating physical reasoning and geometry into its models, enabling more accurate and robust perception. This is evident in the development of novel physically-grounded visual backbones and the integration of geometric priors into photometric stereo networks. Another significant trend is the improvement of efficiency and accuracy in depth estimation and visual odometry, with a focus on real-time deployment and robustness under adverse conditions. Notable papers in this area include: DPVO-QAT++ which achieves significant reductions in memory footprint and processing time for deep patch visual odometry, GeoUniPS which leverages geometric priors for universal photometric stereo, RTS-Mono which proposes a real-time self-supervised monocular depth estimation method, WeSTAR which enhances the generalization of depth estimation foundation models via weakly-supervised adaptation, SEC-Depth which introduces a self-evolution contrastive learning framework for robust depth estimation, RoMa v2 which presents a novel matching architecture for dense feature matching, MOMNet which proposes an alignment-free framework for depth super-resolution, Lite Any Stereo which achieves efficient zero-shot stereo matching. These papers demonstrate the progress being made in physics-aware perception and its applications in vision and graphics.
Physics-Aware Perception in Vision and Graphics
Sources
DPVO-QAT++: Heterogeneous QAT and CUDA Kernel Fusion for High-Performance Deep Patch Visual Odometry
Geometry Meets Light: Leveraging Geometric Priors for Universal Photometric Stereo under Limited Multi-Illumination Cues