Listen

Description

Pierce and Richard are back for the second listener mailbag. They break down what reward hacking really is and why models so often learn the wrong lesson, explain practical fine-tuning (from pre-training to prompting), unpack why LLMs use tokens instead of words, how context length is a hardware versus mathematic limitation, and much more.