Pierce and Richard are back for the second listener mailbag. They break down what reward hacking really is and why models so often learn the wrong lesson, explain practical fine-tuning (from pre-training to prompting), unpack why LLMs use tokens instead of words, how context length is a hardware versus mathematic limitation, and much more.