Kimi Linear: Efficient Expressive Attention Architecture

Description

The October 30, 2025 **technical report** details the development and evaluation of **Kimi Linear**, a novel **hybrid linear attention architecture** for large language models (LLMs). The core innovation is the **Kimi Delta Attention (KDA)** module, which refines existing linear attention mechanisms to achieve superior performance and efficiency compared to traditional full attention, particularly in **long-context scenarios**. Empirical results from extensive pretraining and fine-tuning experiments demonstrate that Kimi Linear **outperforms baselines** across various tasks, including general reasoning and code generation, while significantly reducing **memory usage** and increasing **decoding throughput**. The report also includes a **complexity analysis** and a detailed discussion of KDA's relationship to other efficient attention and state-space models.

Source:

https://arxiv.org/pdf/2510.26692

Listen

Description

Want to check another podcast?