Advancements in Audio Generation and Evaluation

The field of audio generation and evaluation is witnessing significant advancements, driven by innovations in neural audio codecs, automatic subjective quality prediction, and audio language models. Researchers are exploring new methods to analyze and improve the statistical and linguistic properties of neural audio codecs, which is leading to better speech recognition and resynthesis tasks. The development of comprehensive benchmarking frameworks and challenges, such as the AudioMOS Challenge, is facilitating progress in the field by providing a platform for evaluating and comparing different audio codecs and models. Furthermore, the introduction of efficient and stable architectures, such as AudioRWKV, is enabling the processing of long audio sequences and improving the performance of audio modeling tasks. Noteworthy papers in this area include: The paper Analysing the Language of Neural Audio Codecs, which presents a comparative analysis of the statistical and linguistic properties of neural audio codecs. The paper Continuous Audio Language Models, which introduces a new paradigm for audio generation by representing audio as continuous sequences, achieving higher quality at lower computational cost.

Sources

Analysing the Language of Neural Audio Codecs

The AudioMOS Challenge 2025

Parallel Needleman-Wunsch on CUDA to measure word similarity based on phonetic transcriptions

AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation

AudioRWKV: Efficient and Stable Bidirectional RWKV for Audio Pattern Recognition

The First Voice Timbre Attribute Detection Challenge

Continuous Audio Language Models

Built with on top of