Requests, Limits, and the Throttling Trap: K8s Resources for Node.js

28 min · 27 de may de 2026

Descripción

You set a CPU limit on your pod, the node has plenty of capacity to spare, and yet your Node.js service is throttled to a crawl. How does that happen? The answer lives deep in the Linux kernel, in the CFS bandwidth controller and cgroups, and most teams never look there. Requests and limits are not two knobs for the same thing. One drives scheduling, the other enforces a hard quota, and confusing them is how you end up paying for CPU you can never actually use. So how do you size them right for an event-loop runtime? In this episode of The Node (and more) Banter, Luca Maraschi and Matteo Collina break down how Kubernetes CPU and memory allocation actually works, what requests and limits really do at the kernel level, and why the defaults quietly sabotage Node.js workloads. In this episode, we cover: ✅ What CPU requests and limits actually mean, and why they are not interchangeable. ✅ How CFS quota and cgroups cause throttling even when the node is mostly idle. ✅ The three QoS classes (Guaranteed, Burstable, BestEffort) and which one fits a Node.js service. ✅ Why a single-threaded runtime makes limit sizing trickier than it looks, and the patterns that avoid silent throttling. The takeaway? Kubernetes will give you exactly what you asked for, including the throttling you did not mean to ask for. Get requests and limits right and your Node.js services run predictably. Get them wrong and you will burn money on capacity the scheduler will never let you touch.

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de The Node (and more) Banter!

Prueba gratis

Todos los episodios

61 episodios

Requests, Limits, and the Throttling Trap: K8s Resources for Node.js

27 de may de 202628 min

Should We Rewrite Node.js in Rust?

Bun made the switch. Zig is out, Rust is in, and AI handled most of the work, with 98% of the test suite passing. The question is no longer hypothetical; it's real now. If an AI can port an entire runtime, why are so many enterprise teams still stuck on a Node 12 codebase they're afraid to update? In this episode of The Node (and more) Banter, Luca Maraschi and Matteo Collina talk about the Bun Zig-to-Rust port, including the memory leaks that led to the change, the rumors around it, and what it means that AI made it happen. They also look at the bigger picture: meta-cloud platforms losing their advantage, Node.js downloads passing 680 million a month thanks to AI tools, and why major AI companies still don't have a seat on the Node.js TSC, even though they build billions of dollars of products on it. In this episode, we cover: ✅ Why Bun is moving from Zig to Rust, and why memory safety matters more than the drama ✅ How AI managed to port a full runtime with 98% of tests passing, and what made this possible ✅ The key to AI-powered migrations: integration tests that focus on results, not how things are built ✅ Can enterprise teams do the same, and what does upgrading from Node 12 to Node 24 with AI really look like? The takeaway? The real breakthrough wasn't the model, but the test suite. Without integration tests, there is no migration, whether you use AI or not. That's the lesson hidden in the Bun story, and it's one most teams will miss while debating Rust versus Zig.

20 de may de 202629 min

We Ran DOOM in a Node.js Terminal. Now There's No Excuse for Your Legacy Native Code (with Paolo Insogna)

What started as a joke at the Node Collaborator Summit turned into the most compelling argument yet for why enterprises have no excuse left to avoid modernizing their native code. In this episode of The Node (and more) Banter, Luca and Matteo are joined by Paolo, Principal Software Engineer at Platformatic, who built "Project Destino" — because in Italian, destino means doom, and yes, that's exactly how we name things. That comment in London turned into a fully working DOOM port running at 35fps inside your terminal, with sound, powered entirely by Node.js FFI, OpenTUI, and a C library called DOOM Generic. In this episode, we cover: ✅ How Node.js's native FFI module lets you load and run any C library. No native addons, no compilation headaches ✅ Why the game loop lives in JavaScript (via setInterval) while the engine ticks happen across the FFI boundary ✅ The FFI performance story: from 150 nanoseconds per call down to 15, close to the theoretical minimum ✅ Node.js Single Executable Applications (SEA): ship everything — game, sound, native libraries — as one binary ✅ The enterprise reality: if FFI can run DOOM, it can run your legacy DLLs — and there's no migration excuse left ✅ What's next: llama.cpp via FFI, NVIDIA GPU experiments — and possibly Prince of Persia The takeaway? We didn't port DOOM because it made sense. We did it because the technology made it possible — and that's exactly the point. Node.js FFI changes the migration conversation for every enterprise sitting on legacy native code. If it runs DOOM, it runs your C library. No excuses.

13 de may de 202636 min

Predictive Autoscaling for Node.js: Why Reactive Systems Are Costing You More Than You Think

Many teams believe autoscaling is simple: set a CPU threshold and let Kubernetes handle it. But if you get a three-minute traffic spike and your pods need two minutes to start, users feel the lag, the spike ends, and your new pods show up too late. What if your infrastructure could predict traffic instead of just reacting? In this episode, Luca Maraschi and Matteo Collina challenge the usual autoscaling approach—HPA, KEDA, ECS, and more. They explain the predictive scaling algorithm Platformatic created for their Intelligent Command Center (ICC). Matteo explains why scaling is a nonlinear problem that the industry keeps trying to solve with linear solutions, and how thinking of a distributed system like a neural network can change how scaling decisions are made. In this episode, we cover: ✅ Why reactive autoscalers always use outdated data by design, and why this is a core flaw, not just a configuration issue ✅ The real cost of pod boot time that often gets ignored: spawn time, warmup, and traffic rebalancing ✅ Why CPU and memory are not the right metrics for Node.js, and what you should measure instead ✅ How Platformatic's algorithm checks event loop utilization from outside the thread, with no interference and no extra overhead ✅ The benchmark results: 99.47% success rate compared to 95% with KEDA and 90% with HPA, with P99 latency ten times better ✅ The 46-page white paper they published, and why they believe it's time to stop scaling out of fear The takeaways? Over-provisioning is not a safety net; it shows the model is broken. If your system cannot predict load, you will always have to choose between wasting money and getting worse performance. This episode, along with the white paper, argues that smarter scaling is now essential, not just a bonus.

6 de may de 202636 min

Sandboxing Ai Agents in Kubernetes: Why Regina Uses eBPF and Not a VM

Running AI agents in production isn't just about picking the right LLM. It's about the infrastructure decisions that make them safe, fast, and deployable at scale. Choosing between micro VMs, gVisor, Firecracker, or eBPF sounds like a systems engineering rabbit hole, until you realize the wrong choice can mean seconds of startup latency, infrastructure bloat, or an isolation model that doesn't match your actual threat surface. In this episode of The Node (& More) Banter, Luca Maraschi and Matteo Collina go deep on the architecture behind Regina, Platformatic's AI agent sandbox, and explain exactly why they chose eBPF over traditional VM-based isolation. We unpack the two fundamental agent sandbox patterns, the trade-offs between logical and physical isolation, and why Node.js turned out to be a surprisingly perfect fit for systems-level eBPF work. We'll explore: ✅ eBPF vs. micro VMs: startup latency, infrastructure complexity, and where the real boundary sits ✅ Process-level vs. container-level isolation and why granularity changes everything for agents ✅ The snowflake problem: managing stateful agents across Kubernetes pod restarts ✅ How syscall and network policies are enforced at runtime, per agent process ✅ Why Node.js is a natural fit for eBPF and why they built their own stack instead of using OpenCilium The big picture? Infrastructure shapes safety. If you're building or deploying AI agents on Kubernetes, this episode gives you the mental model for why isolation at the process level, not the container level, matters and why Node.js can hold its own as a systems language when the architecture is right.

29 de abr de 202631 min

Requests, Limits, and the Throttling Trap: K8s Resources for Node.js

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios