SE Radio 717: Eric Tschetter on Decoupling Observability

1 h 0 min · 23 de abr de 2026

Descripción

In this episode, host Amey Ambade sits with Eric Tschetter, co-founder of Apache Druid and Chief Architect at Imply, to dissect the critical move toward Decoupling Observability. To begin, they define three pillars—logs, metrics, and traces—and consider why the rise of microservices has made traditional, tightly coupled stacks a major source of pain. Such coupled systems can lead to issues such as vendor lock-in, prohibitive scaling costs, and operational complexity. Drawing parallels to the Business Intelligence world's separation, Tschetter presents an architectural solution with four distinct layers: Ingest/Route, Data Storage, Query/Compute, and Visualization. This framework aims to provide flexibility to combat the limitations of monolithic observability tools. The conversation moves into the practical challenges and significant benefits of this decoupled model, focusing heavily on data portability and the role of technologies such as OpenTelemetry in standardizing schemas so that data can flow freely between multiple back-ends. A significant portion of the discussion is dedicated to the Query/Compute layer, specifically how Apache Druid addresses the unique demands of real-time analytics on observability data, including indexing strategies and unifying results across hot and cold storage. They also delve into operational survival, covering critical topics like smart sampling to preserve high-value signals, best practices for buffering and backpressure, and the governance models required for multiple teams to safely access the same data lake. The episode concludes with an honest look at the complexity trade-offs and a roadmap for organizations considering a migration from a coupled vendor stack.

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Software Engineering Radio - the podcast for professional software developers!

Prueba gratis

Todos los episodios

708 episodios

SE Radio 723: Dave Airlie on Linux Kernel Maintenance

Dave Airlie, a Distinguished Engineer at Red Hat, speaks with host Gregory M. Kapfhammer about Linux kernel maintenance. After over-viewing the scale and structure of the Linux kernel, they dive deep into the review and validation of kernel patches, drawing on examples from the GPU subsystem. After discussing the features and benefits of the Linux kernel's maintenance model, they also explore kernel maintenance best practices and the supporting tools for these practices. Dave and Gregory also discuss topics such as the integration of Rust code in the Linux kernel and the ways in which AI-driven code review are influencing kernel maintenance.

3 de jun de 20261 h 9 min

SE Radio 722: Dwayne McDaniel on the Engineering Challenges of Secrets Management

Dwayne McDaniel [https://trail.gitguardian.com/api/t/c/usr_XgQEQPgwFZ78282oE/tsk_CSQx5hsyFYHpAHx2r/enc_U2FsdGVkX19aR9lGtbabCxEhb9Yde_hsokM0Br2H8cO0MuhkXtGOlxqoSa2kzhx9AJEkM4SrYvH4PEzf842ZL9fm-omZUuEVXLdnzhA74ugphvs8lMXgwE63YENVZ9Ax], developer advocate at GitGuardian.com [http://gitguardian.com/], joins host Priyanka Raghavan to talk about the engineering challenges of secrets management. They explore what "secrets" really are in modern systems—far beyond passwords—including API keys, tokens, certificates, and machine identities, and how "secret sprawl" emerges across the SDLC. Drawing on reports from GitGuardian and Verizon, they discuss the growing scale of secret leaks and why credential abuse and phishing remain dominant attack vectors. They examine common leak points—from code repos and logs to CI/CD pipelines, containers, and SaaS integrations—and how cloud, DevOps, and AI tooling are amplifying risks. Priyanka quizzes Dwayne about recent supply chain attacks from pyPi and trivy ecosystems, highlighting recurring root causes like poor access control, long-lived credentials, and weak security hygiene. Finally, they consider detection, response, and modern solutions—short-lived credentials, secret scanning, and identity-based approaches like OWASP NHIR and SPIFFE/SPIRE—ending with practical advice for engineers to reduce blast radius and design for secure secret lifecycle management.

27 de may de 202652 min

SE Radio 721: Rob Moffat on Risk-First Software Development

In this episode, Rob Moffat, author of Risk-First Software Development and chief technical architect at the FinTech Open Source Software Foundation (FINOS), speaks with host Brijesh Ammanath about how all of software development is actually risk management. Rob introduces the concept of 'risk-first software development,' which sits in the context of existing methodologies like scrum and kanban. Showcasing multiple real-world project patterns to illustrate how things can go wrong when risk is ignored, he makes the case for why risk should be the primary lens behind every development decision, from architecture to prioritization. Through various examples, he shows how every developer action can be viewed as a risk trade-off and why making that explicit can lead to better outcomes. The conversation takes a deep dive into the risk-first framework and how teams can apply it in their existing processes.

20 de may de 202652 min

SE Radio 720: Martin Dilger on Understanding Eventsourcing

Martin Dilger, founder and CEO of Nebuilt GmbH, speaks with host Giovanni Asproni about event sourcing -- a software architecture pattern in which, rather than storing just the current state of your data, you store a sequence of events that represents every change that has ever happened in the system. This episode starts by introducing the vocabulary around event sourcing, highlighting its relationship with event modeling, event streaming, and event storming. Martin describes some of the pros and cons of the approach, including which systems it is most suitable for. The conversation ends with guidance how to get started with event sourcing, for both greenfield and legacy systems.

13 de may de 202655 min

SE Radio 719: Birol Yildiz on Building an Agentic AI SRE

Birol Yildiz, CEO and co-founder of iLert, joins host Kanchan Shringi to explore how iLert built an AI SRE — an autonomous agent for handling production incidents — and what the experience revealed about building AI agents in the real world. Birol explains why incident response is a fundamentally agentic problem, where the unpredictability of novel incidents makes rule-based runbooks insufficient and reasoning models essential. He describes how the AI SRE evolved from an early browser-based approach to its current architecture, built around two key ingredients: reasoning models and the Model Context Protocol. The conversation examines the four layers of the AI SRE in depth: an orchestration layer that routes requests and abstracts model providers; a knowledge layer built on plain text memory and agentic search rather than vector databases; an evaluation framework based on recorded live investigations replayed against new model versions; and a human-in-the-loop constraint layer. The episode concludes with practical advice for teams building agents: own your context completely, avoid off-the-shelf frameworks that obscure what enters the model, and get out of the way of the reasoning model rather than over-prescribing its steps.

6 de may de 202653 min

SE Radio 717: Eric Tschetter on Decoupling Observability

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios