Listen

Description

What if AI could match enterprise-grade performance at a fraction of the cost? In this episode, we dive deep into DeepSeek, the groundbreaking open-source models challenging tech giants with 95% lower costs. From innovative training optimizations to revolutionary data curation, discover how a resource-constrained startup is redefining what's possible in AI.

🎯 Episode Highlights:

Whether you're an ML engineer or AI enthusiast, learn how clever optimization is democratizing advanced AI capabilities. No GPU farm needed!

References for main topic:

  1. [2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

  2. DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

  3. [2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

  4. [2412.19437] DeepSeek-V3 Technical Report

  5. https://arxiv.org/abs/2501.12948

  6. https://www.deepspeed.ai/2021/03/07/zero3-offload.html

  7. [1910.02054] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

  8. [2205.05198] Reducing Activation Recomputation in Large Transformer Models

  9. [2406.03488] Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training