Listen

Description

The paper "SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering" introduces a novel system that uses language models (LMs) to autonomously solve complex software engineering tasks.

The core premise of the paper is that while traditional user interfaces, like the Linux shell, are optimized for human developers, they are poorly suited for LM agents, often leading to inefficient actions, lack of feedback, and context-window flooding. To address this, the authors propose an Agent-Computer Interface (ACI)—an abstraction layer specifically designed around the capabilities and limitations of LMs.

The researchers built SWE-agent, which utilizes a custom ACI featuring:

By operating through this tailored interface, SWE-agent with GPT-4 Turbo achieves state-of-the-art performance. It successfully resolved 12.47% of real-world GitHub issues on the SWE-bench test set and achieved an 87.7% pass@1 rate on the HumanEvalFix benchmark, significantly outperforming prior non-interactive models and standard shell-based agents.