The field of event-based vision and multimodal interaction is rapidly advancing, with a focus on developing more efficient and effective methods for processing and understanding complex visual and auditory data. One of the key areas of research is the development of novel frameworks and architectures for event-based neural networks, which are designed to handle the unique challenges of event-based data. Another area of focus is the integration of multimodal inputs, such as vision, audio, and text, to enable more natural and engaging human-computer interaction. Notable papers in this area include: EVA, a novel A2S framework that generates highly expressive and generalizable event-by-event representations, outperforming prior A2S methods on recognition tasks. AW-GATCN, an Adaptive Weighted Graph Attention Convolutional Network that achieves superior recognition accuracies on event camera data joint denoising and object recognition tasks.