Listen

Description

SCENARIO:
You deploy a new ML training job requiring 8 GPUs, but pods are stuck in Pending. The K8s Scheduler logs show 'no nodes available'. Walk me
through exactly what Karpenter does to resolve this, step by step.
WHAT THEY'RE TESTING: K8s Scheduler vs Karpenter's role, the 4-step lifecycle
THE ANSWER:
• WATCH: Karpenter controller watches for pods marked 'unschedulable' by K8s scheduler
• EVALUATE: Reads ALL constraints from Pod Spec:
 - Resource requests (8 GPUs, memory, CPU)
 - nodeSelector, nodeAffinity, tolerations
 - Topology spread constraints
• PROVISION: Calls AWS EC2 API to launch instance matching ALL requirements
 - Selects p3.16xlarge (8 GPUs) in correct zone
 - Applies NodePool's taints, labels, kubelet config
• RESULT: Node joins cluster, K8s scheduler binds the pod
→ Key insight: Karpenter provisions, K8s scheduler still does final binding!