DevOps & Cloud Interview Questions and Answers - Part 1

Midnight Cleanup: Consolidation & Drift (Karpenter)

25 min · 28. Feb. 2026
Episode Midnight Cleanup: Consolidation & Drift (Karpenter) Cover

Beschreibung

SCENARIO: It's 2 AM, traffic is at 5%, but we have 50 nodes running at 10% utilization. Also, some nodes are running an old AMI from 3 months ago. How does Karpenter handle both issues? WHAT THEY'RE TESTING: Consolidation, Drift Detection, Expiration (TTL) THE ANSWER: • CONSOLIDATION (underutilized nodes):  disruption:  consolidationPolicy: WhenEmptyOrUnderutilized  consolidateAfter: 30s  - Karpenter identifies low-utilization nodes  - Drains pods to other nodes, terminates empty ones • DRIFT DETECTION (old AMI):  - Karpenter compares node spec vs current NodeClass  - If AMI changed in NodeClass, node is marked 'drifted'  - Gracefully replaces with new node running correct AMI • EXPIRATION (TTL):  expireAfter: 720h # Force refresh every 30 days → All three are types of DISRUPTION - Karpenter's cleanup mechanism

Kommentare

0

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der DevOps & Cloud Interview Questions and Answers - Part 1-Community!

Loslegen

2 Monate für 1 €

Dann 4,99 € / Monat · Jederzeit kündbar.

  • Podcasts nur bei Podimo
  • 20 Stunden Hörbücher / Monat
  • Alle kostenlosen Podcasts

Alle Folgen

12 Folgen

Episode Midnight Cleanup: Consolidation & Drift (Karpenter) Cover

Midnight Cleanup: Consolidation & Drift (Karpenter)

SCENARIO: It's 2 AM, traffic is at 5%, but we have 50 nodes running at 10% utilization. Also, some nodes are running an old AMI from 3 months ago. How does Karpenter handle both issues? WHAT THEY'RE TESTING: Consolidation, Drift Detection, Expiration (TTL) THE ANSWER: • CONSOLIDATION (underutilized nodes):  disruption:  consolidationPolicy: WhenEmptyOrUnderutilized  consolidateAfter: 30s  - Karpenter identifies low-utilization nodes  - Drains pods to other nodes, terminates empty ones • DRIFT DETECTION (old AMI):  - Karpenter compares node spec vs current NodeClass  - If AMI changed in NodeClass, node is marked 'drifted'  - Gracefully replaces with new node running correct AMI • EXPIRATION (TTL):  expireAfter: 720h # Force refresh every 30 days → All three are types of DISRUPTION - Karpenter's cleanup mechanism

28. Feb. 202625 min
Episode Unstuck: The Karpenter Lifecycle Cover

Unstuck: The Karpenter Lifecycle

SCENARIO: You deploy a new ML training job requiring 8 GPUs, but pods are stuck in Pending. The K8s Scheduler logs show 'no nodes available'. Walk me through exactly what Karpenter does to resolve this, step by step. WHAT THEY'RE TESTING: K8s Scheduler vs Karpenter's role, the 4-step lifecycle THE ANSWER: • WATCH: Karpenter controller watches for pods marked 'unschedulable' by K8s scheduler • EVALUATE: Reads ALL constraints from Pod Spec:  - Resource requests (8 GPUs, memory, CPU)  - nodeSelector, nodeAffinity, tolerations  - Topology spread constraints • PROVISION: Calls AWS EC2 API to launch instance matching ALL requirements  - Selects p3.16xlarge (8 GPUs) in correct zone  - Applies NodePool's taints, labels, kubelet config • RESULT: Node joins cluster, K8s scheduler binds the pod → Key insight: Karpenter provisions, K8s scheduler still does final binding!

26. Jan. 202639 min