Listen

Description

https://arxiv.org/html/2505.13379v1

This source introduces Thinkless, a reinforcement learning framework designed to make large language models (LLMs) more efficient by enabling them to decide whether to use detailed, long-form reasoning or provide a brief, short-form answer. The core innovation is the Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which separates the learning objectives for choosing the reasoning mode and improving the accuracy of the response. This decoupled approach helps stabilize training and prevents the model from defaulting to just one reasoning style. Experiments show that Thinkless significantly reduces the need for long-chain thinking on various benchmarks while maintaining performance.