The field of event-based vision is witnessing significant developments, with a focus on improving the recognition of activities and objects in various environments. Researchers are exploring innovative approaches to integrate multiple modalities, such as sequence-based and image-based representations, to enhance the accuracy and robustness of event-based vision systems. Notably, the use of contrastive alignment and hierarchical asymmetric distillation is gaining attention as a means to bridge the spatio-temporal gaps between different modalities. Furthermore, the development of novel frameworks and methods, such as those leveraging semantic information and filter-based reconstruction, is advancing the state-of-the-art in event-to-video reconstruction and object tracking. Noteworthy papers include: CARE, which proposes an end-to-end framework for ADL recognition from event-triggered sensor streams, achieving state-of-the-art performance on multiple datasets. Semantic-E2VID, which introduces a cross-modal feature alignment module to enhance event-to-video reconstruction by leveraging visual semantic knowledge.