Advances in Large Language Models

The field of large language models (LLMs) is rapidly evolving, with a focus on improving their ability to understand and generate human-like language. Recent developments have highlighted the importance of considering the physical world, cultural context, and social norms when designing and evaluating LLMs. Researchers are exploring new methods for measuring physical-world privacy awareness, cultural conflict, and social bias in LLMs, and are developing more robust and nuanced evaluation benchmarks. Notable papers in this area include 'Measuring Physical-World Privacy Awareness of Large Language Models' and 'CCD-Bench: Probing Cultural Conflict in Large Language Model Decision-Making'. These studies demonstrate the need for more comprehensive and multidisciplinary approaches to LLM development and evaluation.

Sources

Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs

Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations

A Cross-Lingual Analysis of Bias in Large Language Models Using Romanian History

Self-Improvement in Multimodal Large Language Models: A Survey

Semantic Differentiation in Speech Emotion Recognition: Insights from Descriptive and Expressive Speech Roles

Can LLMs Hit Moving Targets? Tracking Evolving Signals in Corporate Disclosures

Linguistic and Audio Embedding-Based Machine Learning for Alzheimer's Dementia and Mild Cognitive Impairment Detection: Insights from the PROCESS Challenge

Implicit Values Embedded in How Humans and LLMs Complete Subjective Everyday Tasks

SEER: The Span-based Emotion Evidence Retrieval Benchmark

Red Lines and Grey Zones in the Fog of War: Benchmarking Legal Risk, Moral Harm, and Regional Bias in Large Language Model Military Decision-Making

PrivacyMotiv: Speculative Persona Journeys for Empathic and Motivating Privacy Reviews in UX Design

CCD-Bench: Probing Cultural Conflict in Large Language Model Decision-Making

Predicting Stock Price Movement with LLM-Enhanced Tweet Emotion Analysis

Mechanistic Interpretability of Socio-Political Frames in Language Models

FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction

Decoding Emotion in the Deep: A Systematic Study of How LLMs Represent, Retain, and Express Emotion

PsycholexTherapy: Simulating Reasoning in Psychotherapy with Small Language Models in Persian

Autonomy Matters: A Study on Personalization-Privacy Dilemma in LLM Agents

LongTail-Swap: benchmarking language models' abilities on rare words

Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where?

Epistemic Diversity and Knowledge Collapse in Large Language Models

Evaluating Self-Supervised Speech Models via Text-Based LLMS

Empowering Denoising Sequential Recommendation with Large Language Model Embeddings

GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

Are BabyLMs Deaf to Gricean Maxims? A Pragmatic Evaluation of Sample-efficient Language Models

Social bias is prevalent in user reports of hate and abuse online

LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0

Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness

A New Digital Divide? Coder Worldviews, the Slop Economy, and Democracy in the Age of AI

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)

Inducing State Anxiety in LLM Agents Reproduces Human-Like Biases in Consumer Decision-Making

EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preference

Reward Model Perspectives: Whose Opinions Do Reward Models Reward?

Fine-Grained Emotion Recognition via In-Context Learning

LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations

Can We Hide Machines in the Crowd? Quantifying Equivalence in LLM-in-the-loop Annotation Tasks

A Framework for Measuring How News Topics Drive Stock Movement

Ethical AI prompt recommendations in large language models using collaborative filtering

Human-aligned AI Model Cards with Weighted Hierarchy Architecture

Does Local News Stay Local?: Online Content Shifts in Sinclair-Acquired Stations

Prompt Optimization Across Multiple Agents for Representing Diverse Human Populations

Making Machines Sound Sarcastic: LLM-Enhanced and Retrieval-Guided Sarcastic Speech Synthesis

Quantifying Data Contamination in Psychometric Evaluations of LLMs

Exposing LLM User Privacy via Traffic Fingerprint Analysis: A Study of Privacy Risks in LLM Agent Interactions