This September 2025 paper introduces WebSailor-V2, an open-source deep research agent developed by Alibaba Group's Tongyi Lab. The paper details a post-training pipeline that uses a novel synthetic data construction scheme, SailorFog-QA-V2, and a dual-environment reinforcement learning framework. WebSailor-V2, built on the Qwen3-30B-A3B model, demonstrates state-of-the-art performance among open-source agents and is competitive with leading proprietary systems on various web-agent benchmarks, including BrowseComp and Humanity's Last Exam. The authors emphasize that high-quality data and a stable training environment are more crucial than the specific RL algorithm for developing robust AI agents.
Source:
https://arxiv.org/pdf/2509.13305