Karpenter Lifecycle: How GPU Pods Get Unstuck

Description

A pending ML training job needing 8 GPUs is a classic Karpenter interview scenario — here's the exact four-step lifecycle an interviewer expects you to walk through.

You'll learn:

Why the K8s scheduler marks pods unschedulable and how Karpenter's controller watches for that signal
How Karpenter evaluates all pod constraints at once — resource requests, nodeSelector, nodeAffinity, tolerations, and topology spread
How it calls the EC2 API to select the right instance (p3.16xlarge for 8 GPUs) in the correct availability zone
Why Karpenter provisions the node but the K8s scheduler still does the final pod binding — a gotcha that trips up a lot of candidates

Keywords: Karpenter node provisioning, Kubernetes GPU scheduling, pending pods interview question, Karpenter vs cluster autoscaler, K8s scheduler lifecycle

🎧 Listen, then go deeper — DevOps & Cloud interview-prep ebooks at DevOpsInterview.Cloud

Listen

Description

Want to check another podcast?