DevOps & Cloud Interview Questions and Answers - Part 1

Midnight Cleanup: Consolidation & Drift (Karpenter)

25 min · 28 de feb de 2026
portada del episodio Midnight Cleanup: Consolidation & Drift (Karpenter)

Descripción

SCENARIO: It's 2 AM, traffic is at 5%, but we have 50 nodes running at 10% utilization. Also, some nodes are running an old AMI from 3 months ago. How does Karpenter handle both issues? WHAT THEY'RE TESTING: Consolidation, Drift Detection, Expiration (TTL) THE ANSWER: • CONSOLIDATION (underutilized nodes):  disruption:  consolidationPolicy: WhenEmptyOrUnderutilized  consolidateAfter: 30s  - Karpenter identifies low-utilization nodes  - Drains pods to other nodes, terminates empty ones • DRIFT DETECTION (old AMI):  - Karpenter compares node spec vs current NodeClass  - If AMI changed in NodeClass, node is marked 'drifted'  - Gracefully replaces with new node running correct AMI • EXPIRATION (TTL):  expireAfter: 720h # Force refresh every 30 days → All three are types of DISRUPTION - Karpenter's cleanup mechanism

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y forma parte de la comunidad de DevOps & Cloud Interview Questions and Answers - Part 1!

Prueba gratis

Empieza 7 días de prueba

$99 / mes después de la prueba. · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros al mes
  • Podcast gratuitos

Todos los episodios

12 episodios

episode Midnight Cleanup: Consolidation & Drift (Karpenter) artwork

Midnight Cleanup: Consolidation & Drift (Karpenter)

SCENARIO: It's 2 AM, traffic is at 5%, but we have 50 nodes running at 10% utilization. Also, some nodes are running an old AMI from 3 months ago. How does Karpenter handle both issues? WHAT THEY'RE TESTING: Consolidation, Drift Detection, Expiration (TTL) THE ANSWER: • CONSOLIDATION (underutilized nodes):  disruption:  consolidationPolicy: WhenEmptyOrUnderutilized  consolidateAfter: 30s  - Karpenter identifies low-utilization nodes  - Drains pods to other nodes, terminates empty ones • DRIFT DETECTION (old AMI):  - Karpenter compares node spec vs current NodeClass  - If AMI changed in NodeClass, node is marked 'drifted'  - Gracefully replaces with new node running correct AMI • EXPIRATION (TTL):  expireAfter: 720h # Force refresh every 30 days → All three are types of DISRUPTION - Karpenter's cleanup mechanism

28 de feb de 202625 min
episode Unstuck: The Karpenter Lifecycle artwork

Unstuck: The Karpenter Lifecycle

SCENARIO: You deploy a new ML training job requiring 8 GPUs, but pods are stuck in Pending. The K8s Scheduler logs show 'no nodes available'. Walk me through exactly what Karpenter does to resolve this, step by step. WHAT THEY'RE TESTING: K8s Scheduler vs Karpenter's role, the 4-step lifecycle THE ANSWER: • WATCH: Karpenter controller watches for pods marked 'unschedulable' by K8s scheduler • EVALUATE: Reads ALL constraints from Pod Spec:  - Resource requests (8 GPUs, memory, CPU)  - nodeSelector, nodeAffinity, tolerations  - Topology spread constraints • PROVISION: Calls AWS EC2 API to launch instance matching ALL requirements  - Selects p3.16xlarge (8 GPUs) in correct zone  - Applies NodePool's taints, labels, kubelet config • RESULT: Node joins cluster, K8s scheduler binds the pod → Key insight: Karpenter provisions, K8s scheduler still does final binding!

26 de ene de 202639 min