MIT researchers propose compressing LLM context in latent space rather than token space. Using closed-form linear algebra instead of gradient descent, Attention Matching achieves 50x KV cache compression in seconds — dramatically outperforming summarization on information-dense tasks like medical records QA. We cover the memory wall, why summarization fails, the three-step attention matching algorithm, nonuniform head budgets, and what this means for serving long-context models at scale.