Spinning to Zero

Description

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate closes a gap that has been open since Claude Shannon defined the theoretical floor for lossy compression in 1948. For nearly eighty years, practical vector quantization methods fell exponentially short of what rate-distortion theory says is achievable — either achieving good distortion bounds only through expensive offline training, or running online but paying an exponentially growing quality penalty at higher bit depths. TurboQuant reaches within a constant factor of 2.7× of the information-theoretic optimum, with no training required, at inference time — enabling LLM KV cache compression to 3.5 bits per channel with zero quality degradation and near-zero indexing overhead for nearest neighbor search.

Listen

Description

Want to check another podcast?