Listen

Description

In this post, 0xkato explains how modern transformer-based LLMs work, walking through the core machinery that turns text into token IDs, embeds them as vectors, tracks position, uses attention and feed-forward networks to process meaning, and then predicts the next token in a loop. The piece is pitched as an accessible, low-math introduction, showing how shared architecture, trained weights, model configuration, and post-training together shape systems like GPT, Claude, Gemini, and LLaMA.

* 00:00 - Introduction

* 02:40 - Tokenization

* 05:45 - Embeddings

* 08:36 - Positional encoding

* 13:02 - Attention

* 19:10 - Multi-head attention

* 23:39 - Feed-forward network

* 29:03 - Residual stream and layer normalization

* 33:41 - Next-token prediction

* 37:26 - Architecture vs trained weights

* 39:50 - Where this is going

https://www.0xkato.xyz/how-llms-actually-work/



Get full access to Askwho Casts AI at askwhocastsai.substack.com/subscribe