Iniciar sesión

The Node (and more) Banter

The Node (and more) Banter

How Do We Scale Rate Limiting in Node.js?

30 min · 3 de jun de 2026

Portada del episodio How Do We Scale Rate Limiting in Node.js?

Descripción

Your Node.js service has rate limiting. You tested it, it worked, you shipped it. But your limit lives in the memory of a single process, and the moment you run more than one instance behind a load balancer, every replica enforces its own private budget. Ten instances means ten times the traffic you thought you were allowing. That is not a smaller problem at scale, it is a different problem entirely. So what does rate limiting that actually holds across a cluster look like? In this episode of The Node (and more) Banter, Luca Maraschi and Matteo Collina dig into rate limiting for Node.js applications, why the obvious approach quietly fails in production, and how to build limits that survive horizontal scaling. In this episode, we cover: ✅ The mechanics of in-process rate limiting and the trap of the in-memory store. ✅ Why per-instance counters cannot enforce a global limit once you scale out. ✅ How to back rate limiting with a shared store like Redis or Valkey, and the patterns that actually work. ✅ The operational realities: clock skew across nodes, what happens when your shared store goes down, and the consistency-versus-performance balance you have to choose. The takeaway? A rate limit is a promise about your whole system, not one process. If it only holds on a single instance, it is lying to you. Get distributed rate limiting right, and Node.js handles abuse and traffic spikes gracefully. Get it wrong, and you will find out the hard way, in production, under load.

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de The Node (and more) Banter!

Todos los episodios

62 episodios

How Do We Scale Rate Limiting in Node.js?

Your Node.js service has rate limiting. You tested it, it worked, you shipped it. But your limit lives in the memory of a single process, and the moment you run more than one instance behind a load balancer, every replica enforces its own private budget. Ten instances means ten times the traffic you thought you were allowing. That is not a smaller problem at scale, it is a different problem entirely. So what does rate limiting that actually holds across a cluster look like? In this episode of The Node (and more) Banter, Luca Maraschi and Matteo Collina dig into rate limiting for Node.js applications, why the obvious approach quietly fails in production, and how to build limits that survive horizontal scaling. In this episode, we cover: ✅ The mechanics of in-process rate limiting and the trap of the in-memory store. ✅ Why per-instance counters cannot enforce a global limit once you scale out. ✅ How to back rate limiting with a shared store like Redis or Valkey, and the patterns that actually work. ✅ The operational realities: clock skew across nodes, what happens when your shared store goes down, and the consistency-versus-performance balance you have to choose. The takeaway? A rate limit is a promise about your whole system, not one process. If it only holds on a single instance, it is lying to you. Get distributed rate limiting right, and Node.js handles abuse and traffic spikes gracefully. Get it wrong, and you will find out the hard way, in production, under load.

3 de jun de 202630 min

Requests, Limits, and the Throttling Trap: K8s Resources for Node.js

You set a CPU limit on your pod, the node has plenty of capacity to spare, and yet your Node.js service is throttled to a crawl. How does that happen? The answer lives deep in the Linux kernel, in the CFS bandwidth controller and cgroups, and most teams never look there. Requests and limits are not two knobs for the same thing. One drives scheduling, the other enforces a hard quota, and confusing them is how you end up paying for CPU you can never actually use. So how do you size them right for an event-loop runtime? In this episode of The Node (and more) Banter, Luca Maraschi and Matteo Collina break down how Kubernetes CPU and memory allocation actually works, what requests and limits really do at the kernel level, and why the defaults quietly sabotage Node.js workloads. In this episode, we cover: ✅ What CPU requests and limits actually mean, and why they are not interchangeable. ✅ How CFS quota and cgroups cause throttling even when the node is mostly idle. ✅ The three QoS classes (Guaranteed, Burstable, BestEffort) and which one fits a Node.js service. ✅ Why a single-threaded runtime makes limit sizing trickier than it looks, and the patterns that avoid silent throttling. The takeaway? Kubernetes will give you exactly what you asked for, including the throttling you did not mean to ask for. Get requests and limits right and your Node.js services run predictably. Get them wrong and you will burn money on capacity the scheduler will never let you touch.

27 de may de 202628 min

Should We Rewrite Node.js in Rust?

Bun made the switch. Zig is out, Rust is in, and AI handled most of the work, with 98% of the test suite passing. The question is no longer hypothetical; it's real now. If an AI can port an entire runtime, why are so many enterprise teams still stuck on a Node 12 codebase they're afraid to update? In this episode of The Node (and more) Banter, Luca Maraschi and Matteo Collina talk about the Bun Zig-to-Rust port, including the memory leaks that led to the change, the rumors around it, and what it means that AI made it happen. They also look at the bigger picture: meta-cloud platforms losing their advantage, Node.js downloads passing 680 million a month thanks to AI tools, and why major AI companies still don't have a seat on the Node.js TSC, even though they build billions of dollars of products on it. In this episode, we cover: ✅ Why Bun is moving from Zig to Rust, and why memory safety matters more than the drama ✅ How AI managed to port a full runtime with 98% of tests passing, and what made this possible ✅ The key to AI-powered migrations: integration tests that focus on results, not how things are built ✅ Can enterprise teams do the same, and what does upgrading from Node 12 to Node 24 with AI really look like? The takeaway? The real breakthrough wasn't the model, but the test suite. Without integration tests, there is no migration, whether you use AI or not. That's the lesson hidden in the Bun story, and it's one most teams will miss while debating Rust versus Zig.

20 de may de 202629 min

We Ran DOOM in a Node.js Terminal. Now There's No Excuse for Your Legacy Native Code (with Paolo Insogna)

What started as a joke at the Node Collaborator Summit turned into the most compelling argument yet for why enterprises have no excuse left to avoid modernizing their native code. In this episode of The Node (and more) Banter, Luca and Matteo are joined by Paolo, Principal Software Engineer at Platformatic, who built "Project Destino" — because in Italian, destino means doom, and yes, that's exactly how we name things. That comment in London turned into a fully working DOOM port running at 35fps inside your terminal, with sound, powered entirely by Node.js FFI, OpenTUI, and a C library called DOOM Generic. In this episode, we cover: ✅ How Node.js's native FFI module lets you load and run any C library. No native addons, no compilation headaches ✅ Why the game loop lives in JavaScript (via setInterval) while the engine ticks happen across the FFI boundary ✅ The FFI performance story: from 150 nanoseconds per call down to 15, close to the theoretical minimum ✅ Node.js Single Executable Applications (SEA): ship everything — game, sound, native libraries — as one binary ✅ The enterprise reality: if FFI can run DOOM, it can run your legacy DLLs — and there's no migration excuse left ✅ What's next: llama.cpp via FFI, NVIDIA GPU experiments — and possibly Prince of Persia The takeaway? We didn't port DOOM because it made sense. We did it because the technology made it possible — and that's exactly the point. Node.js FFI changes the migration conversation for every enterprise sitting on legacy native code. If it runs DOOM, it runs your C library. No excuses.

13 de may de 202636 min

Predictive Autoscaling for Node.js: Why Reactive Systems Are Costing You More Than You Think

Many teams believe autoscaling is simple: set a CPU threshold and let Kubernetes handle it. But if you get a three-minute traffic spike and your pods need two minutes to start, users feel the lag, the spike ends, and your new pods show up too late. What if your infrastructure could predict traffic instead of just reacting? In this episode, Luca Maraschi and Matteo Collina challenge the usual autoscaling approach—HPA, KEDA, ECS, and more. They explain the predictive scaling algorithm Platformatic created for their Intelligent Command Center (ICC). Matteo explains why scaling is a nonlinear problem that the industry keeps trying to solve with linear solutions, and how thinking of a distributed system like a neural network can change how scaling decisions are made. In this episode, we cover: ✅ Why reactive autoscalers always use outdated data by design, and why this is a core flaw, not just a configuration issue ✅ The real cost of pod boot time that often gets ignored: spawn time, warmup, and traffic rebalancing ✅ Why CPU and memory are not the right metrics for Node.js, and what you should measure instead ✅ How Platformatic's algorithm checks event loop utilization from outside the thread, with no interference and no extra overhead ✅ The benchmark results: 99.47% success rate compared to 95% with KEDA and 90% with HPA, with P99 latency ten times better ✅ The 46-page white paper they published, and why they believe it's time to stop scaling out of fear The takeaways? Over-provisioning is not a safety net; it shows the model is broken. If your system cannot predict load, you will always have to choose between wasting money and getting worse performance. This episode, along with the white paper, argues that smarter scaling is now essential, not just a bonus.

6 de may de 202636 min