Darling: Reinforcing Diversity and Quality in Language Models

Description

This September 2025 paper introduces Diversity-Aware Reinforcement Learning (Darling), a novel framework designed to enhance both the quality and semantic diversity of large language model (LLM) generations. Recognizing that traditional post-training methods often sacrifice diversity for accuracy, Darling integrates a learned partition function to measure semantic diversity beyond simple lexical variations. This diversity signal is then multiplied with a quality reward during online reinforcement learning, which encourages LLMs to produce responses that are not only high-quality but also distinct and novel. Experiments on both non-verifiable tasks, such as creative writing, and verifiable tasks, like competition math, demonstrate that Darling consistently outperforms quality-only baselines, achieving improved scores in both quality metrics and diversity measures (e.g., pass@1 and pass@k). A key finding is that explicitly optimizing for diversity can catalyze exploration in online RL, leading to a simultaneous improvement in the overall quality of the generated responses.

Source:

https://arxiv.org/pdf/2509.02534

Listen

Description

Want to check another podcast?