The field of computer vision is moving towards developing more efficient and lightweight models for segmentation and detection tasks, particularly in resource-constrained environments such as edge devices and mobile hardware. Researchers are exploring innovative architectures and techniques to reduce computational complexity and improve performance, including multi-modal bottleneck fusion, calibrated decoder pruning, sparse convolution, and transformer-based approaches. These advancements have the potential to enable real-time inference and accurate threat assessment in various applications, including security, surveillance, and automotive systems. Notable papers in this area include MOBIUS, which achieves state-of-the-art performance in instance segmentation while reducing computational cost, and ArmFormer, which demonstrates superior accuracy and efficiency in multi-class weapon segmentation. SPLite Hand and An Efficient Semantic Segmentation Decoder also showcase significant improvements in efficiency and accuracy for 3D hand pose estimation and semantic segmentation tasks, respectively. Beyond RGB: Leveraging Vision Transformers for Thermal Weapon Segmentation explores the potential of vision transformers in thermal weapon segmentation, achieving state-of-the-art results and demonstrating robust generalization capabilities.