Kafka's Exactly-Once Delivery: The Truth Behind the Marketing
Kafka’s exactly-once delivery sounds like magic, but it’s more like a really good magic trick. It works beautifully inside Kafka’s world, but the moment your pipeline touches S3, a database, or any external system, those guarantees start to crack. In this episode, we pull back the curtain on what exactly-once really means, where it works, where it falls apart, and how to build systems that don’t implode when reality hits at 3 a.m.
Description
Every distributed systems engineer has been there. You build a beautiful Kafka pipeline, enable all the exactly-once flags, and feel invincible. Then someone asks if there will ever be duplicates in the database, and your stomach drops. Because the truth is, Kafka’s exactly-once delivery comes with fine print.
This episode breaks down the three flavors of message delivery in Kafka: at-most-once (fire and forget), at-least-once (reliable but duplicates), and exactly-once (the dream with conditions). We walk through how messages actually move through Kafka, from producers to brokers to consumers, and explain where things can go wrong at each step.
You’ll learn how Kafka achieves exactly-once semantics using idempotent producers and transactional APIs, and how Apache Flink extends these guarantees with checkpointing and coordinated commits. But we don’t stop at the success stories. We dig into the Achilles’ heel: external systems that don’t speak Kafka’s transactional language.
Through a concrete example of writing to S3 and Elasticsearch, we show exactly where duplicates creep in, why timeouts and partial failures make everything worse, and what you can actually do about it. Spoiler: the answer involves idempotent writes, outbox patterns, and accepting that at-least-once is often the practical reality.
Kafka didn’t solve the Two Generals Problem or break the laws of distributed systems. It just gave us really good tools to handle chaos within a well-defined boundary. And honestly, that’s more than enough. This episode will help you understand exactly where that boundary is and how to design systems that work with it, not against it.
Key Topics
Message Delivery Guarantees: The difference between at-most-once, at-least-once, and exactly-once delivery semantics in Kafka.
Idempotent Producers: How Kafka uses unique IDs and sequence numbers to prevent duplicate writes when producers retry.
Transactional Consumers: Atomically committing Kafka offsets with processing outcomes using Kafka’s transactional API.
Apache Flink and Exactly-Once: How Flink uses checkpointing and coordinated commits to extend exactly-once guarantees beyond simple Kafka-to-Kafka pipelines.
External Systems Integration: Why exactly-once breaks down when writing to S3, databases, and APIs that don’t support distributed transactions.
Practical Solutions: Idempotent writes, outbox patterns, and accepting at-least-once with downstream deduplication strategies.
Real-World Failures: Concrete examples of timeouts, partial failures, and how to handle them without losing data or creating duplicates.
Get full access to Compiling Ideas at patrickkoss.substack.com/subscribe [https://patrickkoss.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]