Software Testing Unleashed - QA, DevEx & Quality Engineering

Why Traditional Testing Fails for AI Systems - Dušanka Lečić

24 min · 28 de may de 2026
Portada del episodio Why Traditional Testing Fails for AI Systems - Dušanka Lečić

Descripción

From prompt failures to hallucinations: what breaks in AI testing 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "For the same input we have a lot of different outputs, some of them can be similar, but yeah still non-determinism is completely there." - Dušanka Lečić This time I talk with Dušanka Lečić about why testing chatbots breaks everything we know about traditional QA. She explains how chatbot bugs are invisible – they hide in prompts, retrieval logic, and chunks, not in code – and why the same input can produce dozens of valid outputs. Dušanka shares her framework for testing context retention, hallucination control, and accuracy, and reveals why stress testing a chatbot means checking for typos and user frustration, not system load. Dušanka Lečić [https://www.linkedin.com/in/dusanka-lecic/] is a dynamic leader and technical expert with nearly a decade of experience steering software testing initiatives across international teams. As a Test Lead and Department Manager at Levi9, she specializes in performance testing, agile methodologies, and engineering excellence. Holding a Ph.D. in Technical Sciences, Dušanka blends academic insight with real-world execution, and is a frequent contributor to industry conferences, mentoring programs, and expert communities. Her sessions offer a rich perspective on quality assurance, innovation, and leadership in fast-paced development environments. Highlights: * Chatbot bugs are invisible in the traditional sense because they live not only in code, but in prompts, retrieval logic, and response generation, requiring a different debugging approach entirely. * Non-determinism in chatbot responses means multiple valid outputs exist for the same input, which breaks the classical pass/fail model and demands a wider definition of what counts as a correct test result. * Traceability in chatbot testing must cover chunks, retrieval results, and queries, not just the final response, because without that full log, root-cause analysis of a wrong answer is nearly impossible. * The CHAT framework structures chatbot testing around four concerns: context retention, hallucination control, accuracy and relevance, and a testing workflow that includes tracing, fixing, and retesting with similar queries. * Stress testing for chatbots means checking responses to misspellings, ambiguous terms, and bad wording that frustrated users produce, not measuring system performance under load.

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Software Testing Unleashed - QA, DevEx & Quality Engineering!

Empezar

2 meses por 1 €

Después 4,99 € / mes · Cancela cuando quieras.

  • Podcasts exclusivos
  • 20 horas de audiolibros / mes
  • Podcast gratuitos

Todos los episodios

56 episodios

Portada del episodio Why COBOL Developers Prefer Writing Tests in Java - Szymon Wałachowski, Bartosz Filipek

Why COBOL Developers Prefer Writing Tests in Java - Szymon Wałachowski, Bartosz Filipek

When no IBM tool fits, two engineers built their own testing layer 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "The ultimate goal is to make people's life easier." - Bartosz Filipek What happens when 40 years of custom decisions stack so high that even the standard testing tools from your own vendor stop working? With Bartosz Filipek and Szymon Wałachowski I talk about exactly that situation: a mainframe environment so deep in its own customization that the only way forward was to build one final bridge to the outside world. We dig into how they created a Java-based unit testing tool for COBOL developers, and what surprised me most is that COBOL programmers find it easier to write assertions in Java than in their own first language. We also get into code coverage, integration with tools like SonarQube and X-ray, and the long road of getting something as basic as a service account approved. Szymon Wałachowski(https://www.linkedin.com/in/walachowski/ [https://www.linkedin.com/in/walachowski/]) is a Senior Software Engineer, Professional Nerd, and part-time QA team member (the plot twist nobody saw coming). He has broken things in JavaScript, Blockchain, ML, JVM, and Mainframes — which taught him to love QA so much that he now builds testing tools to help with modernization. He enjoys modernizing legacy systems than build from scratch — because making ancient code dance with modern APIs while everyone says „that’s impossible” is his idea of fun. Bartosz Filipek [https://www.linkedin.com/in/bartosz-filipek-aa7878a0] is an IT Architect with over 10 years of experience in software development, specializing in Java, Scala, and TypeScript. Passionate about automation and fostering strong collaboration between development teams and customers, he places a strong emphasis on quality assurance and testing. In recent years, his focus has shifted toward architectural design and guiding teams toward future-ready solutions, while leveraging his extensive background in backend, frontend, and DevOps. Highlights: * Building one final, well-designed customization layer that bridges a legacy environment to standard tooling is more sustainable than letting ad-hoc customizations accumulate indefinitely. * COBOL developers found it easier to write unit test assertions in Java than in COBOL itself, because the Java API was designed so that no prior Java knowledge is required. * When a unit testing capability is missing in a legacy stack, the absence cascades: reporting, release validation, and integration with tools like SonarQube and X-Ray all become blocked as a consequence. * Code coverage for COBOL programs is technically achievable through the IBM debugger's built-in line-tracking option, without requiring a custom implementation. More Links with Insights: * Testwarez Conference [https://testwarez.pl/]

4 de jun de 202624 min
Portada del episodio Why Traditional Testing Fails for AI Systems - Dušanka Lečić

Why Traditional Testing Fails for AI Systems - Dušanka Lečić

From prompt failures to hallucinations: what breaks in AI testing 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "For the same input we have a lot of different outputs, some of them can be similar, but yeah still non-determinism is completely there." - Dušanka Lečić This time I talk with Dušanka Lečić about why testing chatbots breaks everything we know about traditional QA. She explains how chatbot bugs are invisible – they hide in prompts, retrieval logic, and chunks, not in code – and why the same input can produce dozens of valid outputs. Dušanka shares her framework for testing context retention, hallucination control, and accuracy, and reveals why stress testing a chatbot means checking for typos and user frustration, not system load. Dušanka Lečić [https://www.linkedin.com/in/dusanka-lecic/] is a dynamic leader and technical expert with nearly a decade of experience steering software testing initiatives across international teams. As a Test Lead and Department Manager at Levi9, she specializes in performance testing, agile methodologies, and engineering excellence. Holding a Ph.D. in Technical Sciences, Dušanka blends academic insight with real-world execution, and is a frequent contributor to industry conferences, mentoring programs, and expert communities. Her sessions offer a rich perspective on quality assurance, innovation, and leadership in fast-paced development environments. Highlights: * Chatbot bugs are invisible in the traditional sense because they live not only in code, but in prompts, retrieval logic, and response generation, requiring a different debugging approach entirely. * Non-determinism in chatbot responses means multiple valid outputs exist for the same input, which breaks the classical pass/fail model and demands a wider definition of what counts as a correct test result. * Traceability in chatbot testing must cover chunks, retrieval results, and queries, not just the final response, because without that full log, root-cause analysis of a wrong answer is nearly impossible. * The CHAT framework structures chatbot testing around four concerns: context retention, hallucination control, accuracy and relevance, and a testing workflow that includes tracing, fixing, and retesting with similar queries. * Stress testing for chatbots means checking responses to misspellings, ambiguous terms, and bad wording that frustrated users produce, not measuring system performance under load.

28 de may de 202624 min
Portada del episodio Why Testers Are Safe Despite AI Hype - Mitko Mitev

Why Testers Are Safe Despite AI Hype - Mitko Mitev

From test planning to defect clustering: where AI already saves you 30% effort 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "People should stop asking on interviews what's the difference between class and object. You should probably ask: What is MCP?" - Mitko Mitev This time I talk to Mitko Mitev, about how AI is reshaping our work as testers, without replacing us. Mitko shows exactly where AI tools save real time across test planning, test case generation, and exploratory testing, and why human expertise remains non-negotiable for context, business logic, and validation. We go into the shift from writing scripts to instructing agents in plain language, how ISTQB's new AI syllabi prepare testers for what's coming, and why waiting another year to explore AI might already be too late. With over 30 years in software quality assurance and more than 20 years as a Project and Test Manager, Mitko Mitev [https://www.linkedin.com/in/mitko-mitev-030522/] is recognized as one of South East Europe’s leading software testing experts. A dedicated advocate for the QA and testing professions, he has been instrumental in establishing and promoting international standards through his work with the ISTQB and as President of the South East European Testing Board (SEETB). Mitko also serves as Chief Editor of Quality Matters magazine and Chair of the SEETEST conference, both focused on advancing global best practices in software quality. Today, Mitko continues to develop and refine educational materials, books, and articles that help professionals deepen their expertise in software testing. He is also the founder and owner of Quality House – a leading outsourcing and consultancy company with offices in Bulgaria, Serbia and Romania, proudly celebrating 21 years on the market and delivering world-class independent testing services. Highlights: * AI challenges the tester's role by shifting mechanical and automation tasks to machines, but context, business logic, and user behavior prediction still require human control. * Generating test cases and test data are the highest-value AI use cases in testing today: a task that takes a human a month can be completed in roughly a week with AI support. * Business stakeholders can contribute directly to test automation when plain-language instructions replace scripting, widening the group of people who can drive test activities. * Scripting knowledge in languages like TypeScript or Python remains necessary because someone has to verify that AI-generated scripts actually do what they are supposed to do. * ISTQB offers two dedicated syllabi covering AI in testing: one on how to test AI products, and a newer one on how to use generative AI in day-to-day testing work, updated every three to six months.

21 de may de 202623 min
Portada del episodio How to Build QA Culture in Your Company - Filip Barszcz

How to Build QA Culture in Your Company - Filip Barszcz

Why your stakeholders, devs and PMs all mean something different by "quality" 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "The truth is that we are feedback givers for all the development teams." - Filip Barszcz In this episode, I talk with Filip Barszcz about what most companies get wrong when they claim to have a quality culture. Filip reveals why stakeholders, developers, and product owners all speak different languages when they say "quality" and how he translates between them to build actual buy-in for testing strategy. He walks through his playbook for introducing change without burning out the team: small wins first, honesty about short-term productivity drops, and color-coded tables that make executives eager to invest in QA. If you've ever struggled to get testing taken seriously beyond "just click through it before release," this conversation gives you the roadmap. Filip Barszcz [https://www.linkedin.com/in/filip-barszcz/] is a full-time QA Chapter Leader with over 10 years of experience in the IT industry. Throughout his career, he has collaborated with renowned organisations such as SCIB (Santander Corporate & Investment Banking), T-Mobile, Capital.Com, and IQVIA. He specialises in building and refining quality assurance processes, mentoring QA professionals, and fostering close collaboration between QA and development teams. As a strategic QA leader, he has driven major organisational transformations — from building QA departments from the ground up to restructuring teams for greater efficiency and alignment with business goals. He has successfully defined test strategy, designed automation architecture, and implemented multi-level testing — from unit to end-to-end coverage. Highlights: * QA professionals are feedback givers for all development teams, not just testers, and that broader role is what makes quality improvement possible across an entire organization. * Introducing too many changes at once creates team fatigue and resistance; one significant change per quarter, fully measured before the next begins, keeps adoption stable. * Framing quality failures as financial cost, by calculating what a late-stage defect costs to fix, is the argument that moves business stakeholders from skepticism to willingness to invest. * Consulting everyone affected by a change before rolling it out, and letting them shape parts of it, turns potential resisters into partial owners who advocate for the new approach. * Visibility of QA work, through regular reports and status updates that use tables and clear metrics, closes the gap between what testers do and what management and stakeholders actually see.

14 de may de 202629 min
Portada del episodio Why Quality Engineers Fail at Business Thinking - Marta Firlej

Why Quality Engineers Fail at Business Thinking - Marta Firlej

How to prove your testing work in money - before the next budget cut hits 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "That's the goal of every company. Every company, government and country make money." - Marta Firlej In this episode, I talk with Marta Firlej about a topic most testers avoid: money. Marta explains why understanding how your company actually makes money is crucial for QA professionals, and walks through the real costs behind salaries, automation projects, and test activities that stakeholders care about. She shares a practical calculation method to assess whether test automation is worth the investment, and challenges us to translate testing value into business numbers. Marta Firlej [https://www.linkedin.com/in/firlejmarta/] is inventor and organizer of the testing conference test:fest [https://www.testfest.pl] in Wrocław Poland. Proud member of the Polish and European testing community by being an organizer of various events, sharing knowledge and experience as a speaker, and participating as an attendee. Currently working as a Head of FS Testing Practice at Capgemini in Poland. Throughout her career, she worked on different positions always having quality in heart for different industries such as finance, healthcare, edutech, etc. Her favorite part is working with people. Highlights: * Testers who cannot explain what a failed release costs in money will lose budget arguments, because managers treat testing as an abstract cost, not a risk mitigation tool. * Automation return on investment depends on maintenance costs and longevity: automating a product that may be cut or heavily revised in the near term produces negative value. * The real cost of an employee to a company is roughly double the gross salary, once employer taxes, benefits, bench time, and overhead are included. * Testers own the responsibility to produce quality reports and communicate risk proactively, because clients will not ask for information they do not know they need. * Understanding how a company earns money, who the key stakeholders are, and what decisions they make is a precondition for translating test results into business value. More Links with Insights: * Testwarez Conference [https://testwarez.pl]

7 de may de 202619 min