Listen

Description

Today we unpack NVIDIA's brand-new blog post on the Nemotron 3 Supermodel and how it powers high-throughput agentic AI. We break down a 1,000,000-token context window, a hybrid mixture-of-experts architecture that routes tasks to subnetworks to avoid full-model compute, a 120B-parameter open model that only activates about 12M parameters at once, memory-efficient MAMBA layers, and multi-token prediction that speeds inference. We discuss implications for software and financial agents, reducing context drift and the thinking tax, and what this could mean for enterprise AI and everyday workflows. We close with a prompt: what ambitious world-changing project would you entrust to an autonomous agent?


Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC