The field of robotic manipulation and perception is rapidly advancing, with a focus on developing more efficient, generalizable, and robust methods for robots to interact with their environment. Recent research has emphasized the importance of 3D geometry-aware policies, multi-view images, and scene graph-based representations for effective robotic manipulation. Additionally, there is a growing interest in improving the sample efficiency and generalizability of robot policies through techniques such as on-manifold exploration, residual off-policy reinforcement learning, and state-aware guided imitation learning. Notable papers in this area include DIPP, which proposes a discriminative impact point predictor for catching diverse in-flight objects, and GP3, which presents a 3D geometry-aware policy with multi-view images for robotic manipulation. Other noteworthy papers include Compose by Focus, which introduces a scene graph-based representation for compositional generalization, and FUNCanon, which proposes a framework for learning pose-aware action primitives via functional object canonicalization. Overall, these advances have the potential to significantly improve the capabilities of robots in a wide range of applications, from manufacturing and logistics to healthcare and service robotics.