Advances in Speech Tokenization and Reinforcement Learning from Human Feedback

The field of speech processing and natural language understanding is witnessing significant developments, with a focus on optimizing speech tokenization and reinforcement learning from human feedback (RLHF). Researchers are investigating the impact of frame rates, segmentation, and vocabulary size on speech tokenization, leading to improved performance in speech recognition and language understanding tasks. Additionally, studies are addressing the challenges of reward model overoptimisation in RLHF, proposing novel methods to accelerate training and ensure fairness in rewards. These advancements have the potential to enhance the efficiency and effectiveness of speech-related applications, including automatic speech recognition, text-to-speech, and language models. Noteworthy papers include:

  • Impact of Frame Rates on Speech Tokenizer, which explores the effect of frame rates on speech tokenization for Mandarin and English.
  • Reward Model Overoptimisation in Iterated RLHF, which presents a comprehensive study of overoptimisation in iterated RLHF and offers insights for building more stable RLHF pipelines.
  • Accelerating RLHF Training with Reward Variance Increase, which proposes a practical reward adjustment model to accelerate RLHF training by increasing reward variance.
  • Towards Reward Fairness in RLHF, which addresses the issue of reward fairness from a resource allocation perspective and proposes bias-agnostic methods to mitigate biases in rewards.

Sources

Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models

Reward Model Overoptimisation in Iterated RLHF

Accelerating RLHF Training with Reward Variance Increase

Towards Reward Fairness in RLHF: From a Resource Allocation Perspective

Spoken Language Modeling with Duration-Penalized Self-Supervised Units

Built with on top of