The field of gesture-based interaction and multimodal event detection is rapidly evolving, with a focus on creating more immersive and expressive experiences. Researchers are leveraging advances in deep learning and computer vision to develop innovative systems that can accurately recognize and interpret human gestures, enabling new forms of human-computer interaction. Additionally, there is a growing interest in multimodal event detection, which involves analyzing and combining data from multiple sources, such as text, images, and audio, to detect and understand events in real-time. This is particularly useful in applications such as social media monitoring and interactive music systems. Overall, the field is moving towards more sophisticated and interactive systems that can understand and respond to human intentions and behaviors. Noteworthy papers in this area include: NeoLightning, which introduces a modern reinterpretation of the Buchla Lightning with precise, low-latency gesture recognition and immersive 3D interaction. Intentional Gesture, which presents a novel framework for gesture generation that leverages high-level communicative functions to create semantically meaningful and temporally aligned gestures.