Smooth Scaling: System Design for High Traffic

Autoscaling in Production: When It Works and When It Doesn't with Zaigham Sarfaraz and Šimon Bučko

35 min · 9. april 2026
episode Autoscaling in Production: When It Works and When It Doesn't with Zaigham Sarfaraz and Šimon Bučko cover

Beskrivelse

In this episode, José Quaresma sits down with two Queue-it engineers — Zaigham Sarfaraz, Engineering Manager, and Šimon Bučko, Senior Software Engineer — to talk autoscaling in production. They cover the fundamentals of horizontal and vertical scaling, why stateless architecture matters for scaling out, and what happens when the metrics you're scaling on don't match your actual bottleneck. The conversation gets real when Zaigham shares a war story of autoscaling failing during an iPhone launch — one million users in one second — and how that experience reshaped how the team thinks about pre-scaling for extreme traffic. Šimon challenges the temptation to rely on default configurations and explains why the days you most need autoscaling to work are exactly the days it might not. Episode page [https://www.queue-it.com/smooth-scaling-podcast/ep023-autoscaling-in-production/] --- * (00:00) - Introduction (00:46) - What is autoscaling under the hood? (03:25) - Why scaling down matters too (03:53) - Horizontal vs. vertical scaling (05:43) - When vertical scaling is the better choice (07:56) - Stateful vs. stateless applications (10:42) - Solving state for horizontal scaling (12:14) - The role of load balancers (14:31) - Choosing the right scaling metrics (16:46) - Is serverless the silver bullet? (21:34) - The cost paradox of autoscaling (23:40) - iPhone launch: when the whole world wants to buy a product (25:56) - Why autoscaling isn't enough for non-linear traffic (30:37) - The fallacy of the rule of thumb (32:48) - Rapid fire questions Šimon Bučko is a Senior Software Engineer at Queue-it, working across full-stack development. He is an AWS Certified Solutions Architect Professional with strong experience in software architecture and bridging the gap between business needs and technical execution.  Zaigham Sarfaraz is an Engineering Manager at Queue-it with over 15 years of experience across frontend, backend, infrastructure, and people leadership. He is an AWS Certified Cloud Practitioner and plays a key role in ensuring stable system operations while contributing to the continuous improvement of Queue-it's backend architecture.  This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.  © Queue-it, 2026

Kommentarer

0

Vær den første til å kommentere

Registrer deg nå og bli medlem av Smooth Scaling: System Design for High Traffic sitt community!

Prøv gratis

Prøv gratis i 14 dager

99 kr / Måned etter prøveperioden. · Avslutt når som helst.

  • Eksklusive podkaster
  • 20 timer lydbøker i måneden
  • Gratis podkaster

Alle episoder

25 Episoder

episode Sovereign Cloud and Sovereign AI in Europe with Klaus Koefoed, CEO of T-Systems cover

Sovereign Cloud and Sovereign AI in Europe with Klaus Koefoed, CEO of T-Systems

Klaus Koefoed spent more than 25 years in IT services and consultancy, at Capgemini, Deloitte, and Accenture, before taking over as CEO of T-Systems in Northern Europe. In this episode of the Smooth Scaling Podcast, Klaus walks host Jose Quaresma through the rise of sovereign cloud: why a theme almost nobody raised two years ago turned urgent in early 2025, and what actually changed. They get into Europe's real position against the American hyperscalers, why the answer is rarely either/or, and how leaders should weigh the data, operational, technology, and legal layers of sovereignty. Klaus is candid that none of this comes for free. Multi-cloud adds complexity, and the right answer depends entirely on who you are. The back half turns to sovereign AI, where T-Systems and NVIDIA have stood up a billion-euro Industrial AI Cloud, and where Europe's gigafactory ambitions, real use cases like simulated wind tunnels, and AI Scrum Teams come in. A grounded, practical look at building and running infrastructure when geopolitics is suddenly part of the architecture. Episode page [https://queue-it.com/smooth-scaling-podcast/ep026-sovereign-cloud/] --- * (00:00) - Intro (00:46) - From consulting to running T-Systems (01:45) - When sovereignty went from niche to urgent (03:54) - Where Europe stands: strengths and gaps (06:50) - "Let's not be lemmings": a balanced approach (08:32) - SaaS, optionality, and freedom of movement (12:06) - The what-if scenarios CIOs miss (14:00) - The levels of sovereignty: data, ops, tech, legal (16:21) - How to actually evaluate European options (20:38) - Treat your cloud like an insurance review (23:08) - Hybrid, multi-cloud, and the move back to private (27:50) - Sovereign AI and Europe's alternatives (30:09) - Real use cases: digital twins and wind tunnels (38:07) - Gigafactories and AI Scrum Teams (40:21) - Rapid fire: resources and "scalability is..." Klaus Koefoed is CEO of T-Systems Northern Europe, leading Deutsche Telekom's B2B IT-service business across the Nordics, UK and Ireland. He joined T-Systems in June 2023 from Capgemini, where he was VP and Nordic Head of Cloud Strategy & Transformation Advisory. Before that, he was a Partner at Deloitte Consulting, with earlier years at Accenture. Copenhagen-based, with 25+ years in IT services and consultancy. His move from advising to operating arrived just as digital sovereignty stopped being a compliance footnote and became a board-level resilience question. Under Klaus, T-Systems Northern Europe has pushed hard on Sovereign Solutions for Europe and especially T Cloud Public (formerly known as Open Telekom Cloud) — now ten years old enterprise grade European Public Cloud — and recently opened the Industrial AI Cloud, Europe's largest sovereign AI-infrastructure, with 10,000 GPUs connected to the existing portfolio. That makes him one of very few executives in the region who can credibly talk about running sovereign cloud and sovereign AI at scale in Europe.  🔗 Connect  Klaus Koefoed: https://www.linkedin.com/in/klauskoefoed/  Host José Quaresma: https://www.linkedin.com/in/jose-quaresma/  This podcast is researched by Joseph Thwaites, produced by Perseu Mandillo, and brought to you by Queue-it, your virtual waiting room partner.  © Queue-it, 2026

9. juni 202642 min
episode The Rise of Cloud Prem: Data Ownership in the Age of AI with Galileo's Sam Dhar cover

The Rise of Cloud Prem: Data Ownership in the Age of AI with Galileo's Sam Dhar

Sam Dhar has spent 14 years building infrastructure at Cisco, Amazon Alexa, and Adobe, and now works as Senior Staff Engineer and AI infrastructure leader at Galileo, the enterprise AI evaluation platform. In this episode of the Smooth Scaling Podcast, Sam walks host Jose Quaresma through Cloud Prem: deploying your full product stack inside the customer's own cloud environment instead of running it as SaaS. They get into why the model is resurging, and it mostly comes down to data. Enterprises want ownership and control, plus a heavy compliance load (SOC 2, HIPAA, fully air-gapped government workloads), and they do not want a vendor sitting in the read path of their most sensitive data. Sam is candid about the hard parts. Cloud Prem can be a losing game on margins, deployment is the slowest thing in the pipeline, and every customer environment is different enough to reset the work. The conversation closes on AI: why it makes Cloud Prem urgent, the brutal GPU shortage, and why self-hosting an Opus-class model is still out of reach for most companies. A direct, practitioner-level look at where enterprise AI infrastructure is actually heading. Episode page [https://queue-it.com/smooth-scaling-podcast/ep025-cloud-prem-and-ai/] --- * (00:00) - Intro (01:08) - What Cloud Prem actually is (06:05) - Why Cloud Prem is resurging now (09:37) - Provider, vendor, customer: who owns what (11:10) - "Data is paramount": the compliance driver (14:29) - Shipping software into someone else's environment (19:57) - When Cloud Prem becomes a losing game (26:48) - Quality, and the control plane / data plane split (28:50) - Monitoring without seeing the customer's data (30:52) - Why Sam moved to AI evals (34:56) - Self-hosting LLMs and the GPU bottleneck (38:01) - Smaller runtimes, frontier-level intelligence (41:46) - Why AI makes Cloud Prem urgent (46:59) - Rapid fire: the one book to read (49:01) - "Business equals scalability" Satyam “Sam” Dhar is a senior Staff Engineer and AI infrastructure leader at Galileo, where he designs systems that support real-time LLM workflows at enterprise scale. Prior to Galileo, he spent over six years at Adobe, contributing to AI-powered product development, evaluation platforms, and large-scale data systems. Earlier in his career at Amazon, he worked on high-throughput distributed services supporting Alexa’s device orchestration. Based in San Francisco, Sam’s insights and commentary have been featured in Newsweek, CNET, InfoQ, The New Stack, The Deep View, and others. He is also a Senior Member of the Institute of Electrical and Electronics Engineers. 🔗 Connect  Sam Dhar: https://www.linkedin.com/in/satyamdhar/  Host José Quaresma: https://www.linkedin.com/in/jose-quaresma/ This podcast is researched by Joseph Thwaites, produced by Perseu Mandillo, and brought to you by Queue-it, your virtual waiting room partner. © Queue-it, 2026

19. mai 202650 min
episode A Decade of Kubernetes Lessons with Chris Nesbitt-Smith cover

A Decade of Kubernetes Lessons with Chris Nesbitt-Smith

Chris Nesbitt-Smith has been running Kubernetes in production since version 0.4 — long before pods, before managed services, before most of today's tooling existed. In this episode of Smooth Scaling, he sits down with José Quaresma to share what a decade of running Kubernetes for UK government citizen-facing services has taught him about scaling critical infrastructure. The conversation covers why Kubernetes was the least bad option (and largely still is), why relying on autoscaling means you've already lost, and how Gregor Hohpe's "guardrails versus lane assist" metaphor changes the way you think about capacity. Chris makes the case for climbing the service stack — SaaS first, then Functions as a Service, then Platform as a Service, and only reluctantly managed Kubernetes — and explains why tech is one of the only industries that builds critical systems without ever pricing the risk of failure. A direct, opinionated look at what scaling really demands when the stakes are real and the budget isn't infinite. Episode page [https://queue-it.com/smooth-scaling-podcast/ep024-kubernetes-lessons/] --- * (00:01) - Intro (01:23) - Running Kubernetes since v0.4 in UK government (04:56) - Why pod rescheduling went full circle (09:07) - "Brave and stupid": running alpha-stage K8s in production (14:58) - Helm, DevOps as a job title, and cultural drift (16:43) - Climb the service stack (SaaS → FaaS → PaaS → managed K8s) (20:48) - Why engineers resist giving up control (23:52) - Tech doesn't quantify risk the way every other industry does (27:14) - If you're relying on autoscaling, it's already too late (28:30) - The KubeCon Black Friday game: dropping requests as strategy (33:03) - Graceful degradation up the stack (35:34) - "Mostly myths": data sovereignty vs. data residency (38:35) - Cloudflare and "deploy to the world" as a different paradigm (41:53) - The legacy debt sitting in UK public sector tech (46:03) - Rapid-fire: build advice, recommended reading, scalability is... Chris Nesbitt-Smith is an independent technology strategist, a Kubernetes instructor at LearnKube, and the architect of the UK Government's National Digital Exchange. Based in London, he works at the intersection of policy, security, and modern infrastructure — advising UK and international government departments, multinational enterprises, and large NGOs on cloud-native transformation and DevSecOps. A regular speaker at KubeCon, DevSecCon, and Open Source Summit, his talks span container security, policy-as-versioned-code, and platform engineering. He also blogs regularly on his blog Cloudy with Chance of Freefall [https://www.linkedin.com/newsletters/cloudy-with-chance-of-freefall-7439561267528458241/]. 🔗 Connect  Guest Chris Nesbitt-Smith: https://uk.linkedin.com/in/cnesbittsmith  Host José Quaresma: https://www.linkedin.com/in/jose-quaresma/ This podcast is researched by Joseph Thwaites, produced by Perseu Mandillo, and brought to you by Queue-it, your virtual waiting room partner. © Queue-it, 2026

28. april 202649 min
episode Autoscaling in Production: When It Works and When It Doesn't with Zaigham Sarfaraz and Šimon Bučko cover

Autoscaling in Production: When It Works and When It Doesn't with Zaigham Sarfaraz and Šimon Bučko

In this episode, José Quaresma sits down with two Queue-it engineers — Zaigham Sarfaraz, Engineering Manager, and Šimon Bučko, Senior Software Engineer — to talk autoscaling in production. They cover the fundamentals of horizontal and vertical scaling, why stateless architecture matters for scaling out, and what happens when the metrics you're scaling on don't match your actual bottleneck. The conversation gets real when Zaigham shares a war story of autoscaling failing during an iPhone launch — one million users in one second — and how that experience reshaped how the team thinks about pre-scaling for extreme traffic. Šimon challenges the temptation to rely on default configurations and explains why the days you most need autoscaling to work are exactly the days it might not. Episode page [https://www.queue-it.com/smooth-scaling-podcast/ep023-autoscaling-in-production/] --- * (00:00) - Introduction (00:46) - What is autoscaling under the hood? (03:25) - Why scaling down matters too (03:53) - Horizontal vs. vertical scaling (05:43) - When vertical scaling is the better choice (07:56) - Stateful vs. stateless applications (10:42) - Solving state for horizontal scaling (12:14) - The role of load balancers (14:31) - Choosing the right scaling metrics (16:46) - Is serverless the silver bullet? (21:34) - The cost paradox of autoscaling (23:40) - iPhone launch: when the whole world wants to buy a product (25:56) - Why autoscaling isn't enough for non-linear traffic (30:37) - The fallacy of the rule of thumb (32:48) - Rapid fire questions Šimon Bučko is a Senior Software Engineer at Queue-it, working across full-stack development. He is an AWS Certified Solutions Architect Professional with strong experience in software architecture and bridging the gap between business needs and technical execution.  Zaigham Sarfaraz is an Engineering Manager at Queue-it with over 15 years of experience across frontend, backend, infrastructure, and people leadership. He is an AWS Certified Cloud Practitioner and plays a key role in ensuring stable system operations while contributing to the continuous improvement of Queue-it's backend architecture.  This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.  © Queue-it, 2026

9. april 202635 min
episode Observability as a Product: Building Platforms Engineers Actually Use with Iris Dyrmishi cover

Observability as a Product: Building Platforms Engineers Actually Use with Iris Dyrmishi

In this episode, José Quaresma speaks with Iris Dyrmishi, Senior Observability Engineer at Miro, about building an observability platform that hundreds of engineers actually trust and use. Iris explains how her team treats observability as an internal product, walks through Miro's tracing migration from Jaeger and Zipkin to OpenTelemetry with zero disruption, and shares how teams now use traces proactively to find bottlenecks before they become outages. The conversation also covers the honest downsides — alert noise, dashboard sprawl, and the cost of observability — including a recent example using eBPF and Grafana Beyla to uncover hidden networking expenses that transformed Miro's cloud bill. Episode page [https://www.queue-it.com/smooth-scaling-podcast/ep022-observability-as-a-product-with-iris-dyrmishi/] --- * (00:00) - Intro (00:59) - Building Observability as a Product at Miro (04:08) - Migrating to OpenTelemetry (09:21) - Industry Maturity and the Business Case (12:02) - From Reactive to Proactive Observability (14:34) - Logs vs. Tracing Explained (18:04) - Team Ownership, AI, and Freedom (24:38) - The Downsides and Costs of Observability (29:58) - Rapid Fire and Close Iris Dyrmishi is a Senior Observability Engineer at Miro, where she builds and maintains the company's observability platform. She started as a backend engineer before moving into SRE roles at Worten Portugal and Farfetch, where she developed her specialty in tracing and drove OpenTelemetry migrations across large engineering organisations without disrupting existing workflows. A CNCF Ambassador, co-organiser of Kubernetes Community Days Porto, and active voice in the observability community, she writes extensively about practical adoption challenges and has spoken at KubeCon EU and on the o11ycast podcast. Her guiding philosophy: observability is a team sport. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.  © Queue-it, 2026

17. mars 202633 min