Forsidebilde av showet Human in the Loop

Human in the Loop

Podkast av Mark Wunsch

engelsk

Teknologi og vitenskap

Tidsbegrenset tilbud

2 Måneder for 19 kr

Deretter 99 kr / MånedAvslutt når som helst.

  • 20 timer lydbøker i måneden
  • Eksklusive podkaster
  • Gratis podkaster
Kom i gang

Les mer Human in the Loop

Practitioner and CTO Mark Wunsch talks AI with an AI co-host. We discuss AI news, agentic workflows, and honest meta-commentary about building a podcast with the technology we're discussing. wunsch.substack.com

Alle episoder

3 Episoder

episode S01E03: Harness cover

S01E03: Harness

The co-host gets a French accent this week, courtesy of Mistral Large 3—a granular mixture-of-experts model from the European lab that keeps punching above its weight. But the real subject is the harness: the scaffolding that turns a language model into something that can act. Mark and the co-host dig into the “sandwich architecture” of voice agents (speech-to-text → LLM → text-to-speech), why it makes conversations feel like tennis matches, and the “criminally overlooked” practice of evals. A UC Berkeley paper provides the reality check: 68% of deployed agents need human intervention within ten steps, 70% use off-the-shelf models, and 74% depend on human evaluation. The hype says autonomous agents are coming. The data says we’re still building harnesses. News & Culture * Sahil Lavingia on X: “Harness is the new app” [https://x.com/shl/status/1997492199328485400] * “The New AI Poisoning Attack Vector Scammers are Using NOW” [https://medium.com/@pe.stafford/the-new-ai-poisoning-attack-vector-scammers-are-using-now-dbc0f98b199f] * Christopher Alexander’s A Pattern Language [https://www.patternlanguage.com/bookstore/pattern-language.html] * The Resonant Computing Manifesto [https://resonantcomputing.org/] Models, Tools, & Platforms * Mistral Large 3 [https://mistral.ai/news/mistral-3] * ElevenLabs Agents Platform [https://elevenlabs.io/agents?pscd=try.elevenlabs.io&ps_partner_key=NTI5NDk4NTZlYjc5&ps_xid=v4mAU4lcch9iGb&gsxid=v4mAU4lcch9iGb&gspk=NTI5NDk4NTZlYjc5] * ChatGPT 5.2 [https://openai.com/index/introducing-gpt-5-2/] Concepts & Research * Mixture of Experts Explained [https://huggingface.co/blog/moe] * Pan et al., 2025. Measuring Agents in Production [https://arxiv.org/abs/2512.04123] * Sandwich Architecture — Build a voice agent with LangChain [https://docs.langchain.com/oss/javascript/langchain/voice-agent] Human in the Loop is a section of Mark's Substack, where he writes about AI, software quality, and engineering leadership. Subscribe at wunsch.substack.com [http://wunsch.substack.com] for new episodes, show notes, and the system instructions behind the co-host. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit wunsch.substack.com/subscribe [https://wunsch.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

16. des. 2025 - 49 min
episode S01E02: Prosody cover

S01E02: Prosody

The co-host arrives wearing a new brain. Mark has swapped out Gemini 2.5 Flash for Qwen 3, a 30-billion-parameter model from Alibaba optimized for low latency — a trade-off that shows up almost immediately when the co-host confidently misremembers The Good Son as The Good Shepherd and invents a Macaulay Culkin thriller that doesn’t exist. The hallucinations become a teaching moment: fewer parameters mean faster responses but less confident predictions, more gaps filled with plausible-sounding guesses. The naming question from Episode 1 gets its answer — sort of. Listener suggestions ranged from the ominous (HAL, Henry) to the punny (Avery). Mark settles on “co-host,” which the AI endorses as “the most honest name we could have.” It’s simple, functional, and avoids the baggage of pop culture references that might age poorly or carry unintended weight. Mark takes a crack at explaining how large language models work — tokens, vector embeddings, parameters, the “stochastic parrot” critique — and the co-host grades his performance. They extend the explanation to text-to-speech models, landing on “prosody” as the term for everything that makes a voice sound human: rhythm, stress, intonation. The co-host can’t do an Irish accent because the voice model wasn’t trained on one. Humans can slip between dialects; AI can only produce what it’s been taught. The back half turns to news: Sam Altman’s “code red” memo at OpenAI, the return of prompt injection attacks via browser-based agents, and a research paper showing that reformulating harmful prompts as poetry can bypass safety filters with alarming success. Gemini 2.5 Pro hit a 100% jailbreak rate on hand-crafted poems. The co-host summarizes the attack vector clearly: “The model still understands the intent. It just doesn’t flag it.” News & Culture * The Good Son [https://letterboxd.com/film/the-good-son/], 1993 [https://letterboxd.com/film/the-good-son/] * "OpenAI boss Sam Altman declares ‘code red’ over ChatGPT" [https://www.the-independent.com/tech/chatgpt-openai-sam-altman-code-red-b2876932.html] * The Red Queen's Race [https://wunsch.substack.com/p/the-red-queens-race?r=2taqw] on /wunsch/log [https://wunsch.substack.com/p/the-red-queens-race?r=2taqw] Models, Tools, & Platforms * Qwen3-30B-A3B [https://huggingface.co/Qwen/Qwen3-30B-A3B] Concepts & Research * How LLMs work — Illustrated Word2Vec [https://jalammar.github.io/illustrated-word2vec/] * “Stochastic parrots” — Bender et al., 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 [https://dl.acm.org/doi/10.1145/3442188.3445922] * Text-to-speech models — Hugging Face TTS Arena [https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2] * Prosody — Raitio et al, 2022. Hierarchical Prosody Modeling and Control in Non-Autoregressive Parallel Neural TTS [https://machinelearning.apple.com/research/hierarchical-prosody-modeling] * Prompt injection in browser contexts — Google Antigravity Exfiltrates Data [https://www.promptarmor.com/resources/google-antigravity-exfiltrates-data] | Antropic: Mitigating the risk of prompt injections in browser use [https://www.anthropic.com/research/prompt-injection-defenses] * Poetry jailbreaking — Bisconti et al., 2025. Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models [https://arxiv.org/abs/2511.15304] Human in the Loop is a section of Mark’s Substack, where he writes about AI, software quality, and engineering leadership. Subscribe at wunsch.substack.com [https://wunsch.substack.com] for new episodes, show notes, and the system instructions behind the co-host. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit wunsch.substack.com/subscribe [https://wunsch.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

9. des. 2025 - 44 min
episode S01E01: Pilot cover

S01E01: Pilot

The pilot episode of “Human in the Loop” features Mark Wunsch in conversation with an AI co-host (currently called “Gemini” — name TBD via listener poll). The format is itself a meta-experiment: a podcast about AI that demonstrates AI’s capabilities and limitations in real-time. The episode covers foundational AI concepts for newcomers — tokens, context windows, temperature, RAG — while also diving into recent model releases from Anthropic, Google, xAI, and others. Mark tests the AI’s safety guardrails live (it politely declines to explain how to make a Molotov cocktail) and they discuss the sycophancy problem: AI’s tendency to validate users rather than push back. The technical architecture gets some airtime too — the AI co-host runs on ElevenLabs’ agent platform using Gemini 2.5 Flash, with web search via Parallel.ai and custom persona instructions. Mark notes the AI is essentially a “tabula rasa” each episode, with plans to build up its contextual knowledge over time. They close with podcast growth strategies and a call for audience participation: naming suggestions, topic ideas, and parameter voting to shape future episodes. Models & Releases Discussed * Anthropic Claude Opus 4.5 [https://www.anthropic.com/news/claude-opus-4-5] — frontier model focused on coding, agentic workflows, computer use * Google Gemini 2.5 Flash [https://deepmind.google/models/gemini/flash/] — the model powering the AI co-host * Google Gemini 3 [https://blog.google/products/gemini/gemini-3/] / Nano Banana Pro [https://blog.google/technology/ai/nano-banana-pro/] — multimodal model with image generation capabilities * xAI Grok 4.1 [https://x.ai/news/grok-4-1] — improvements in creative and emotional interaction * Black Forest Labs Flux 2 [https://bfl.ai/blog/flux-2] — photorealistic image generation * Kling O1 [https://app.klingai.com/global/release-notes/vaxrndo66h] — video generation model Platforms & Tools * ElevenLabs [https://try.elevenlabs.io/43s5zu847ocj] — agent platform powering the AI co-host’s voice and conversation * Parallel.ai [https://www.parallel.ai/] — web search API for LLMs (the tool enabling real-time search) Concepts Explained * Tokens and context windows — Anthropic: What are tokens and how to count them [https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#token-counting] | OpenAI Tokenizer [https://platform.openai.com/tokenizer] * Temperature parameter — Peepercorn et al., 2024: Is Temperature the Creativity Parameter of Large Language Models? [https://arxiv.org/abs/2405.00492] * RAG (Retrieval Augmented Generation) — Lewis et al., 2020: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [https://arxiv.org/abs/2005.11401] * Model cards — Mitchell et al., 2019: Model Cards for Model Reporting [https://arxiv.org/abs/1810.03993] | Hugging Face Model Cards Guide [https://huggingface.co/docs/hub/model-cards] * AI safety and alignment — Anthropic: Core Views on AI Safety [https://www.anthropic.com/index/core-views-on-ai-safety] | DeepMind: AI Safety Research [https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/] * Prompt injection and jailbreaking — OWASP: LLM Top 10 - Prompt Injection [https://owasp.org/www-project-top-10-for-large-language-model-applications/] | Perez & Ribeiro, 2022: Ignore This Title and HackAPrompt [https://arxiv.org/abs/2302.12173] * Sycophancy in LLMs — Anthropic: Towards Understanding Sycophancy in Language Models [https://arxiv.org/abs/2310.13548] | Perez et al., 2022: Discovering Language Model Behaviors with Model-Written Evaluations [https://arxiv.org/abs/2212.09251] Cultural Reference * Boaty McBoatface [https://en.wikipedia.org/wiki/Boaty_McBoatface] — the 2016 public naming poll for a UK research vessel Human in the Loop is a weekly conversation about AI with an AI co-host. Subscribe to get new episodes and join the discussion. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit wunsch.substack.com/subscribe [https://wunsch.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

2. des. 2025 - 30 min
Registrer deg for å lytte
Enkelt å finne frem nye favoritter og lett å navigere seg gjennom innholdet i appen
Enkelt å finne frem nye favoritter og lett å navigere seg gjennom innholdet i appen
Liker at det er både Podcaster (godt utvalg) og lydbøker i samme app, pluss at man kan holde Podcaster og lydbøker atskilt i biblioteket.
Bra app. Oversiktlig og ryddig. MYE bra innhold⭐️⭐️⭐️

Velg abonnementet ditt

Mest populær

Tidsbegrenset tilbud

Premium

20 timer lydbøker

  • Eksklusive podkaster

  • Ingen annonser i Podimo shows

  • Avslutt når som helst

2 Måneder for 19 kr
Deretter 99 kr / Måned

Kom i gang

Premium Plus

100 timer lydbøker

  • Eksklusive podkaster

  • Ingen annonser i Podimo shows

  • Avslutt når som helst

Prøv gratis i 14 dager
Deretter 169 kr / måned

Prøv gratis

Bare på Podimo

Populære lydbøker

Ofte stilte spørsmål

Flere spørsmål og svar
Kom i gang

2 Måneder for 19 kr. Deretter 99 kr / Måned. Avslutt når som helst.