EP153: [SERA] Training AI coding agents on untested code

Description

The paper introduces SERA (Soft-verified Efficient Repository Agents), a new method for training high-performing open-source coding agents at a fraction of the cost of previous approaches. The researchers aim to bridge the gap between closed-source systems and open-weight models by making it practical to specialize agents to private codebases, allowing them to encode repository-specific patterns directly into their weights.

The core innovation is a pipeline called Soft Verified Generation (SVG), which is built on two key observations:

Soft Verification: Rather than using complex and resource-heavy unit tests to verify synthetic data, SVG uses line-level recall to compare patches generated from two separate rollouts. This removes the need for test infrastructure and allows data generation from any repository regardless of its test coverage.
Vague Instructions: The researchers found that using intentionally vague prompts (like asking for a change to a random function) diversifies training data by encouraging tasks like refactoring and documentation, which are often more representative of real-world work than simple bug fixes.

Key Results and Contributions:

Performance: SERA-32B achieves state-of-the-art results for fully open-source models on SWE-bench Verified, matching or exceeding the performance of strong open-weight models like Devstral-Small-2.
Efficiency: The method is 26x cheaper than reinforcement learning and 57x cheaper than previous synthetic data methods to reach equivalent performance. Specializing an agent to a specific codebase (like Django) requires only about 8,000 samples and costs approximately $1,300.
Repository Specialization: The authors demonstrate that a specialized student model can match or exceed the performance of its teacher model (e.g., GLM-4.5-Air) by learning the specific knowledge of a target repository.
Open Resources: The project released the SERA model series, the underlying code, and a dataset of 200,000 synthetic trajectories, the largest of its kind for coding agents.

Overall, the paper argues that SERA democratizes coding agent research by significantly lowering the barrier to entry for individual researchers and small teams.

Listen

Description

Want to check another podcast?