The field of language models is moving towards increased personalization and multimodality, with a focus on capturing individual differences and nuances in human behavior. Recent research has highlighted the importance of subjective cognitive diversity in shaping visual attention and language use, and has proposed novel methods for modeling and predicting personalized attention patterns. The use of multimodal data, including video, audio, and text, is becoming increasingly prevalent, and is enabling more comprehensive and unbiased assessments of human behavior. Notable papers in this area include PRE-MAP, which proposes a novel eye-tracking saliency model that characterizes personalized visual disparities through reinforcement learning-optimized eye-tracking, and Traits Run Deep, which employs psychology-informed prompts to elicit high-level personality-relevant semantic representations. Additionally, Listening to the Unspoken explores 365 aspects of multimodal interview performance assessment, demonstrating the effectiveness of a comprehensive framework that integrates multiple modalities and evaluation dimensions.