- Test-time scaling improves language model performance using extra compute
- A dataset of 1,000 questions was curated for validation
- Budget forcing controls compute by managing the model's reasoning processÂ
- The model outperformed o1-preview by up to 27% on math questionsÂ
- The model and data are open-source for public accessÂ