Today we are joined by Gorkem and Batuhan from Fal.ai, the fastest growing generative media inference provider. They recently raised a $125M Series C and crossed $100M ARR. We covered how they pivoted from dbt pipelines to diffusion models inference, what were the models that really changed the trajectory of image generation, and the future of AI videos. Enjoy!
Full Video Episode
Timestamps
00:00 - Introductions04:58 - History of Major AI Models and Their Impact on Fal.ai07:06 - Pivoting to Generative Media and Strategic Business Decisions10:46 - Technical discussion on CUDA optimization and kernel development12:42 - Inference Engine Architecture and Kernel Reusability14:59 - Performance Gains and Latency Trade-offs15:50 - Discussion of model latency importance and performance optimization17:56 - Importance of Latency and User Engagement18:46 - Impact of Open Source Model Releases and Competitive Advantage19:00 - Partnerships with closed source model developers20:06 - Collaborations with Closed-Source Model Providers21:28 - Serving Audio Models and Infrastructure Scalability22:29 - Serverless GPU infrastructure and technical stack23:52 - GPU Prioritization: H100s and Blackwell Optimization25:00 - Discussion on ASICs vs. General Purpose GPUs26:10 - Architectural Trends: MMDiTs and Model Innovation27:35 - Rise and Decline of Distillation and Consistency Models28:15 - Draft Mode and Streaming in Image Generation Workflows29:46 - Generative Video Models and the Role of Latency30:14 - Auto-Regressive Image Models and Industry Reactions
31:35 - Discussion of OpenAI’s Sora and competition in video generation34:44 - World Models and Creative Applications in Games and Movies35:27 - Video Models’ Revenue Share and Open-Source Contributions36:40 - Rise of Chinese Labs and Partnerships38:03 - Top Trending Models on Hugging Face and ByteDance’s Role39:29 - Monetization Strategies for Open Models40:48 - Usage Distribution and Model Turnover on FAL42:11 - Revenue Share vs. Open Model Usage Optimization42:47 - Moderation and NSFW Content on the Platform44:03 - Advertising as a key use case for generative media45:37 - Generative Video in Startup Marketing and Virality46:56 - LoRA Usage and Fine-Tuning Popularity47:17 - LoRA ecosystem and fine-tuning discussion49:25 - Post-Training of Video Models and Future of Fine-Tuning50:21 - ComfyUI Pipelines and Workflow Complexity52:31 - Requests for startups and future opportunities in the space53:33 - Data Collection and RedPajama-Style Initiatives for Media Models53:46 - RL for Image and Video Models: Unknown Potential55:11 - Requests for Models: Editing and Conversational Video Models57:12 - VO3 Capabilities: Lip Sync, TTS, and Timing58:23 - Bitter Lesson and the Future of Model Workflows58:44 - FAL’s hiring approach and team structure59:29 - Team Structure and Scaling Applied ML and Performance Teams1:01:41 - Developer Experience Tools and Low-Code/No-Code Integration1:03:04 - Improving Hiring Process with Public Challenges and Benchmarks1:04:02 - Closing Remarks and Culture at FAL