Kimi's serving architecture, mooncake to offload GPU memory to other chipsets, the ubiquity of vllm, and the growing standard LLM stack