Episode 60: DeepSeek Models Explained Part I

Description

What if AI could match enterprise-grade performance at a fraction of the cost? In this episode, we dive deep into DeepSeek, the groundbreaking open-source models challenging tech giants with 95% lower costs. From innovative training optimizations to revolutionary data curation, discover how a resource-constrained startup is redefining what's possible in AI.

🎯 Episode Highlights:

Beyond cost-cutting: How DeepSeek matches top-tier AI performance
Game-changing memory optimization and pipeline parallelization
Inside the technology: Zero-redundancy training and dependency parsing
The future of efficient, accessible AI development

Whether you're an ML engineer or AI enthusiast, learn how clever optimization is democratizing advanced AI capabilities. No GPU farm needed!

References for main topic:

[2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
[2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
[2412.19437] DeepSeek-V3 Technical Report
https://arxiv.org/abs/2501.12948
https://www.deepspeed.ai/2021/03/07/zero3-offload.html
[1910.02054] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
[2205.05198] Reducing Activation Recomputation in Large Transformer Models
[2406.03488] Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training

Listen

Description

Want to check another podcast?