Software Testing Unleashed - QA, DevEx & Quality Engineering

Building Trust with AI Agents - Henri Terho

21 min · 30. huhti 2026
jakson Building Trust with AI Agents - Henri Terho kansikuva

Kuvaus

How to build trust into AI systems when they constantly change underneath you 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "AI doesn't think, it doesn't analyze, it predicts." - Henri Terho In this episode, I talk with Henri Terho, senior consultant and AI enthusiast, about why building trust in AI systems requires the same rigor we've always applied to software—just now at a whole new level. Henri explains how AI agents multiply both our successes and our mistakes, why prompting is harder than it looks, and why testers are uniquely positioned to thrive in this shift. We dig into the oracle problem, the communication trap, and why your test suite might soon matter more than your codebase. Henri Terho [https://www.linkedin.com/in/henriterho/] is a Senior AI Consultant at Eficode with broad experience spanning regulated industries—automotive, banking, aerospace, and beyond—alongside a deep commitment to open-source collaboration. He has played a key role in fostering community-driven innovation, having served as chairman of Tampere Entreprenourship society and co-founding Tampere Tribe to support local startup culture. Henri’s passion for AI, quality assurance, and rapid software development is evident in both his industry work and ongoing PhD research on agile product innovation. He frequently shares his expertise on stage and in publications, championing lean practices and the latest AI advances to empower organizations worldwide. Highlights: * AI systems amplify both mistakes and successes at scale, so the checks, guardrails, and validation processes built around the model matter more than the model itself. * Testing AI requires a shift from deterministic pass/fail checks to monitoring trends and mean time between failures, because non-deterministic outputs cannot be verified with a single green test. * The communication problem with AI agents is structurally identical to the bug-report problem with humans: vague input produces generic, context-free output that misses the actual need. * As AI-generated code becomes a black box, test specifications and acceptance criteria become the primary source of truth, making the tester's skill set central rather than peripheral. * AI democratizes software creation by removing the need for programming knowledge, which surfaces long-ignored organizational problems such as document version control and missing single sources of truth.

Kommentit

0

Ole ensimmäinen kommentoija

Rekisteröidy nyt ja liity Software Testing Unleashed - QA, DevEx & Quality Engineering-yhteisöön!

Aloita maksutta

14 vrk ilmainen kokeilu

Kokeilun jälkeen 7,99 € / kuukausi. · Peru milloin tahansa.

  • Podimon podcastit
  • 20 kuunteluaikaa / kuukausi
  • Lataa offline-käyttöön

Kaikki jaksot

55 jaksot

jakson Why Traditional Testing Fails for AI Systems - Dušanka Lečić kansikuva

Why Traditional Testing Fails for AI Systems - Dušanka Lečić

From prompt failures to hallucinations: what breaks in AI testing 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "For the same input we have a lot of different outputs, some of them can be similar, but yeah still non-determinism is completely there." - Dušanka Lečić This time I talk with Dušanka Lečić about why testing chatbots breaks everything we know about traditional QA. She explains how chatbot bugs are invisible – they hide in prompts, retrieval logic, and chunks, not in code – and why the same input can produce dozens of valid outputs. Dušanka shares her framework for testing context retention, hallucination control, and accuracy, and reveals why stress testing a chatbot means checking for typos and user frustration, not system load. Dušanka Lečić [https://www.linkedin.com/in/dusanka-lecic/] is a dynamic leader and technical expert with nearly a decade of experience steering software testing initiatives across international teams. As a Test Lead and Department Manager at Levi9, she specializes in performance testing, agile methodologies, and engineering excellence. Holding a Ph.D. in Technical Sciences, Dušanka blends academic insight with real-world execution, and is a frequent contributor to industry conferences, mentoring programs, and expert communities. Her sessions offer a rich perspective on quality assurance, innovation, and leadership in fast-paced development environments. Highlights: * Chatbot bugs are invisible in the traditional sense because they live not only in code, but in prompts, retrieval logic, and response generation, requiring a different debugging approach entirely. * Non-determinism in chatbot responses means multiple valid outputs exist for the same input, which breaks the classical pass/fail model and demands a wider definition of what counts as a correct test result. * Traceability in chatbot testing must cover chunks, retrieval results, and queries, not just the final response, because without that full log, root-cause analysis of a wrong answer is nearly impossible. * The CHAT framework structures chatbot testing around four concerns: context retention, hallucination control, accuracy and relevance, and a testing workflow that includes tracing, fixing, and retesting with similar queries. * Stress testing for chatbots means checking responses to misspellings, ambiguous terms, and bad wording that frustrated users produce, not measuring system performance under load.

28. touko 202624 min
jakson Why Testers Are Safe Despite AI Hype - Mitko Mitev kansikuva

Why Testers Are Safe Despite AI Hype - Mitko Mitev

From test planning to defect clustering: where AI already saves you 30% effort 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "People should stop asking on interviews what's the difference between class and object. You should probably ask: What is MCP?" - Mitko Mitev This time I talk to Mitko Mitev, about how AI is reshaping our work as testers, without replacing us. Mitko shows exactly where AI tools save real time across test planning, test case generation, and exploratory testing, and why human expertise remains non-negotiable for context, business logic, and validation. We go into the shift from writing scripts to instructing agents in plain language, how ISTQB's new AI syllabi prepare testers for what's coming, and why waiting another year to explore AI might already be too late. With over 30 years in software quality assurance and more than 20 years as a Project and Test Manager, Mitko Mitev [https://www.linkedin.com/in/mitko-mitev-030522/] is recognized as one of South East Europe’s leading software testing experts. A dedicated advocate for the QA and testing professions, he has been instrumental in establishing and promoting international standards through his work with the ISTQB and as President of the South East European Testing Board (SEETB). Mitko also serves as Chief Editor of Quality Matters magazine and Chair of the SEETEST conference, both focused on advancing global best practices in software quality. Today, Mitko continues to develop and refine educational materials, books, and articles that help professionals deepen their expertise in software testing. He is also the founder and owner of Quality House – a leading outsourcing and consultancy company with offices in Bulgaria, Serbia and Romania, proudly celebrating 21 years on the market and delivering world-class independent testing services. Highlights: * AI challenges the tester's role by shifting mechanical and automation tasks to machines, but context, business logic, and user behavior prediction still require human control. * Generating test cases and test data are the highest-value AI use cases in testing today: a task that takes a human a month can be completed in roughly a week with AI support. * Business stakeholders can contribute directly to test automation when plain-language instructions replace scripting, widening the group of people who can drive test activities. * Scripting knowledge in languages like TypeScript or Python remains necessary because someone has to verify that AI-generated scripts actually do what they are supposed to do. * ISTQB offers two dedicated syllabi covering AI in testing: one on how to test AI products, and a newer one on how to use generative AI in day-to-day testing work, updated every three to six months.

21. touko 202623 min
jakson How to Build QA Culture in Your Company - Filip Barszcz kansikuva

How to Build QA Culture in Your Company - Filip Barszcz

Why your stakeholders, devs and PMs all mean something different by "quality" 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "The truth is that we are feedback givers for all the development teams." - Filip Barszcz In this episode, I talk with Filip Barszcz about what most companies get wrong when they claim to have a quality culture. Filip reveals why stakeholders, developers, and product owners all speak different languages when they say "quality" and how he translates between them to build actual buy-in for testing strategy. He walks through his playbook for introducing change without burning out the team: small wins first, honesty about short-term productivity drops, and color-coded tables that make executives eager to invest in QA. If you've ever struggled to get testing taken seriously beyond "just click through it before release," this conversation gives you the roadmap. Filip Barszcz [https://www.linkedin.com/in/filip-barszcz/] is a full-time QA Chapter Leader with over 10 years of experience in the IT industry. Throughout his career, he has collaborated with renowned organisations such as SCIB (Santander Corporate & Investment Banking), T-Mobile, Capital.Com, and IQVIA. He specialises in building and refining quality assurance processes, mentoring QA professionals, and fostering close collaboration between QA and development teams. As a strategic QA leader, he has driven major organisational transformations — from building QA departments from the ground up to restructuring teams for greater efficiency and alignment with business goals. He has successfully defined test strategy, designed automation architecture, and implemented multi-level testing — from unit to end-to-end coverage. Highlights: * QA professionals are feedback givers for all development teams, not just testers, and that broader role is what makes quality improvement possible across an entire organization. * Introducing too many changes at once creates team fatigue and resistance; one significant change per quarter, fully measured before the next begins, keeps adoption stable. * Framing quality failures as financial cost, by calculating what a late-stage defect costs to fix, is the argument that moves business stakeholders from skepticism to willingness to invest. * Consulting everyone affected by a change before rolling it out, and letting them shape parts of it, turns potential resisters into partial owners who advocate for the new approach. * Visibility of QA work, through regular reports and status updates that use tables and clear metrics, closes the gap between what testers do and what management and stakeholders actually see.

14. touko 202629 min
jakson Why Quality Engineers Fail at Business Thinking - Marta Firlej kansikuva

Why Quality Engineers Fail at Business Thinking - Marta Firlej

How to prove your testing work in money - before the next budget cut hits 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "That's the goal of every company. Every company, government and country make money." - Marta Firlej In this episode, I talk with Marta Firlej about a topic most testers avoid: money. Marta explains why understanding how your company actually makes money is crucial for QA professionals, and walks through the real costs behind salaries, automation projects, and test activities that stakeholders care about. She shares a practical calculation method to assess whether test automation is worth the investment, and challenges us to translate testing value into business numbers. Marta Firlej [https://www.linkedin.com/in/firlejmarta/] is inventor and organizer of the testing conference test:fest [https://www.testfest.pl] in Wrocław Poland. Proud member of the Polish and European testing community by being an organizer of various events, sharing knowledge and experience as a speaker, and participating as an attendee. Currently working as a Head of FS Testing Practice at Capgemini in Poland. Throughout her career, she worked on different positions always having quality in heart for different industries such as finance, healthcare, edutech, etc. Her favorite part is working with people. Highlights: * Testers who cannot explain what a failed release costs in money will lose budget arguments, because managers treat testing as an abstract cost, not a risk mitigation tool. * Automation return on investment depends on maintenance costs and longevity: automating a product that may be cut or heavily revised in the near term produces negative value. * The real cost of an employee to a company is roughly double the gross salary, once employer taxes, benefits, bench time, and overhead are included. * Testers own the responsibility to produce quality reports and communicate risk proactively, because clients will not ask for information they do not know they need. * Understanding how a company earns money, who the key stakeholders are, and what decisions they make is a precondition for translating test results into business value. More Links with Insights: * Testwarez Conference [https://testwarez.pl]

7. touko 202619 min
jakson Building Trust with AI Agents - Henri Terho kansikuva

Building Trust with AI Agents - Henri Terho

How to build trust into AI systems when they constantly change underneath you 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "AI doesn't think, it doesn't analyze, it predicts." - Henri Terho In this episode, I talk with Henri Terho, senior consultant and AI enthusiast, about why building trust in AI systems requires the same rigor we've always applied to software—just now at a whole new level. Henri explains how AI agents multiply both our successes and our mistakes, why prompting is harder than it looks, and why testers are uniquely positioned to thrive in this shift. We dig into the oracle problem, the communication trap, and why your test suite might soon matter more than your codebase. Henri Terho [https://www.linkedin.com/in/henriterho/] is a Senior AI Consultant at Eficode with broad experience spanning regulated industries—automotive, banking, aerospace, and beyond—alongside a deep commitment to open-source collaboration. He has played a key role in fostering community-driven innovation, having served as chairman of Tampere Entreprenourship society and co-founding Tampere Tribe to support local startup culture. Henri’s passion for AI, quality assurance, and rapid software development is evident in both his industry work and ongoing PhD research on agile product innovation. He frequently shares his expertise on stage and in publications, championing lean practices and the latest AI advances to empower organizations worldwide. Highlights: * AI systems amplify both mistakes and successes at scale, so the checks, guardrails, and validation processes built around the model matter more than the model itself. * Testing AI requires a shift from deterministic pass/fail checks to monitoring trends and mean time between failures, because non-deterministic outputs cannot be verified with a single green test. * The communication problem with AI agents is structurally identical to the bug-report problem with humans: vague input produces generic, context-free output that misses the actual need. * As AI-generated code becomes a black box, test specifications and acceptance criteria become the primary source of truth, making the tester's skill set central rather than peripheral. * AI democratizes software creation by removing the need for programming knowledge, which surfaces long-ignored organizational problems such as document version control and missing single sources of truth.

30. huhti 202621 min