podcastdetails.com

Share

Pretrained

Eating some mooncake

Listen

Description

Kimi's serving architecture, mooncake to offload GPU memory to other chipsets, the ubiquity of vllm, and the growing standard LLM stack

Print Share

Want to check another podcast?

Enter the RSS feed of a podcast, and see all of their public statistics.

Made by Alex Barredo. Send your feedback to alex@barredo.es.