The Inference Layer

Podcast by inferencelayer.ai

English

Technology & science

Limited Offer

2 months for 19 kr.

Then 99 kr. / monthCancel anytime.

20 hours of audiobooks / month
Podcasts only on Podimo
All free podcasts

Get Started

About The Inference Layer

A new podcast covering the intricate systems, chips, and stacks that define the inference layer and the complexities of moving AI models from training to real-world deployment. The Inference Layer operates as a community-driven initiative, linking a diverse network of university labs, AI specialists, and supporting partners through a volunteer-led framework - coordinated by the Princeton School of AI meetup group and producers of the Humanitarian AI Today podcast and the University of Pittsburgh's Health and Explainable AI podcast.

All episodes

3 episodes

Federico Pierucci on Multi-Agent Risks in Humanitarian Aid at The Inference Layer

This third pilot episode of The Inference Layer bridges the technical complexities of AI deployment with the reality of humanitarian operations, featuring a deep dive into the transition from static models to autonomous agentic systems. On behalf of the Humanitarian AI Today podcast, guest host Patrick Hassan, an AI policy lead with a background in disaster response, interviews Federico Pierucci, Scientific Director of the Icaro Lab, to explore how the inference layer is becoming a site of significant systemic risk. The discussion provides a unique look at inference-time failures such as alignment drift and steganographic coordination that emerge only when multiple agents interact in production environments. For humanitarian actors, the episode raises concerns regarding operating in an era of assistance automated by layers of AI agents. The dialogue highlights how multi-agent chains used for beneficiary selection or resource allocation for example can degrade, develop invisible biases or be weaponized or politicized by parties to a conflict. Federico explains that these risks can be compounded by a lack of safety benchmarks for things like underrepresented languages and dialects, which can lead to unpredictable jailbreaks or administrative failures in the field. The episode provides an inside look at pioneering research being carried out by the Icaro Lab, a Rome-based laboratory specialized in AI safety in collaboration with the Sapienza University. The lab focuses on mechanistic interpretability, a technical field dedicated to understanding the internal attention heads and decision-making units of an AI to decipher how it truly processes information. The discussion introduces the concept of Institutional AI, a proposed framework to manage these emerging xeno-behaviors through a governance graph. Rather than relying solely on prompt engineering or model-level alignment, Federico argues for a protocol-level solution that can manage misbehaving agents during inference. The episode is informative for professionals seeking to understand why AI safety must evolve from a localized technical challenge into a global institutional design problem, particularly in regions where traditional governance has broken down.

18 Mar 2026 - 39 min

Alexandre Marques from Red Hat on Tackling the Hardest Problems in Open Source Inference

Alexandre Marques, Engineering Manager and Team Lead of Machine Learning Research at Red Hat and former Manager of Machine Learning Research at NeuralMagic, speaks with the University of Pittsburgh’s Health and Explainable AI podcast producer, Brent Phillips, about Red Hat and his team’s work building and maintain platforms that power open-source AI inference at scale. In this pilot episode of The Inference Layer, Alexandre discusses his transition from aerospace engineering to leading a research team focused on making large AI models faster, cheaper, and more deployable. He explains that while large labs have proven model capabilities, the current challenge lies in moving these models into production. To bridge the gap between research demos and real-world scaling, he emphasizes the need for a deep understanding of how architectural decisions influence performance and the ability to translate research into high-quality code. The conversation delves into the technical definition of the inference layer, which Alexandre describes as the entire stack, including runtime, hardware, memory management, and batching strategies that sits between a trained model and the end-user experience. He highlights the important role of open source and open research at Red Hat and speaks on his team’s search for a Senior Machine Learning Research Engineer to join the team and work on post-training optimization for large language models and conduct applied research on state-of-the-art inference optimization techniques, including quantization, pruning, knowledge distillation, and speculative decoding. In the interview, Alexandre highlights two ambitious areas he is eager to explore that define the future of the field. First, he is interested in systematically studying how different optimization techniques compound, specifically how speculative decoding interacts with compression methods like quantization in production environments. Second, he aims to tackle the evolution of inference from single, independent models toward the orchestration of multiple models across distributed environments. This shift introduces new layers of complexity in scheduling and systems design, representing the kind of "hard problem" Alexandre believes will define the next few years of AI deployment. The Inference Layer podcast is a collaborative initiative linking university AI labs, researchers, volunteers and supporting partners to explore the complexities of moving models from training to real-world deployment. By highlighting advanced research and frontier challenges, the podcast provides a platform for experts to discuss the cutting-edge developments driving the future of AI.

20 Feb 2026 - 14 min

Manuela Nayantara Jeyaraj Discusses Explainability at the Inference Layer

Manuela Nayantara Jeyaraj, a PhD student and researcher at the The Applied Intelligence Research Centre (AIRC) within the Technological University Dublin speaks with the University of Pittsburgh’s Health and Explainable AI podcast producer, Brent Phillips about explainability at the inference layer. In this pilot episode of The Inference Layer, Manuela discusses her award-winning work on identifying cognitive bias in language models. She explains that while explicit bias is well-studied, her research focuses on implicit, subtle "cognitive biases" that models learn from human patterns, such as gender stereotypes in job recruitment or political descriptions. To address this, Manuela developed an algorithm that combines model-agnostic and model-specific explainability approaches to provide high-confidence justifications for AI decisions. She also highlights the creation of a massive, modern lexicon that captures gendered associations across a wide range of English, from archaic terms to contemporary slang found on TikTok and Instagram. The conversation delves into the technical challenges of maintaining explainability at the inference layer, particularly when transitioning from high-compute cloud environments to resource-constrained edge devices like phones or wearables. Manuela emphasizes that for real-time applications clinical decision-making, explainability cannot be an "afterthought" and must be lightweight enough to run locally to ensure user privacy and trust. In the interview, Manuela highlights two ambitious areas she is eager to explore that connect the technical and human sides of AI. First, she is interested in developing high-confidence, real-time explainability for streaming data, where decisions must be justified in milliseconds without slowing down the model. This includes providing "counterfactual" explanations—identifying exactly what would need to change for a different outcome to occur, such as a patient's risk level shifting from high to low. Second, she wants to tackle the "storytelling" aspect of explainable AI (XAI), creating systems that can tailor the complexity and detail of an explanation to different stakeholders. For instance, in a recruitment scenario, she envisions a model that provides a deep technical justification for a recruiter while offering a more abstracted, helpful level of feedback for the job applicant. The Inference Layer podcast is a collaborative initiative linking university AI labs, researchers, and supporting partners to explore the complexities of moving models from training to real-world deployment. Managed by volunteers, the series focuses on the intricate systems, chips, and stacks that define the inference layer. By highlighting advanced research and frontier challenges the podcast provides a platform for experts to discuss the cutting-edge developments driving the future of AI.

29 Jan 2026 - 23 min

En fantastisk app med et enormt stort udvalg af spændende podcasts. Podimo formår virkelig at lave godt indhold, der takler de lidt mere svære emner. At der så også er lydbøger oveni til en billig pris, gør at det er blevet min favorit app.

Rigtig god tjeneste med gode eksklusive podcasts og derudover et kæmpe udvalg af podcasts og lydbøger. Kan varmt anbefales, om ikke andet så udelukkende pga Dårligdommerne, Klovn podcast, Hakkedrengene og Han duo 😁 👍

Podimo er blevet uundværlig! Til lange bilture, hverdagen, rengøringen og i det hele taget, når man trænger til lidt adspredelse.

Choose your subscription