Software Testing Unleashed - QA, DevEx & Quality Engineering
From prompt failures to hallucinations: what breaks in AI testing 🚨 Are we actually testing too much sometimes? Just because we run a lot of tests doesn’t mean we’ll find a lot of bugs. Here’s how we can solve this: Free Online Workshop [https://tul.fm/team] "For the same input we have a lot of different outputs, some of them can be similar, but yeah still non-determinism is completely there." - Dušanka Lečić This time I talk with Dušanka Lečić about why testing chatbots breaks everything we know about traditional QA. She explains how chatbot bugs are invisible – they hide in prompts, retrieval logic, and chunks, not in code – and why the same input can produce dozens of valid outputs. Dušanka shares her framework for testing context retention, hallucination control, and accuracy, and reveals why stress testing a chatbot means checking for typos and user frustration, not system load. Dušanka Lečić [https://www.linkedin.com/in/dusanka-lecic/] is a dynamic leader and technical expert with nearly a decade of experience steering software testing initiatives across international teams. As a Test Lead and Department Manager at Levi9, she specializes in performance testing, agile methodologies, and engineering excellence. Holding a Ph.D. in Technical Sciences, Dušanka blends academic insight with real-world execution, and is a frequent contributor to industry conferences, mentoring programs, and expert communities. Her sessions offer a rich perspective on quality assurance, innovation, and leadership in fast-paced development environments. Highlights: * Chatbot bugs are invisible in the traditional sense because they live not only in code, but in prompts, retrieval logic, and response generation, requiring a different debugging approach entirely. * Non-determinism in chatbot responses means multiple valid outputs exist for the same input, which breaks the classical pass/fail model and demands a wider definition of what counts as a correct test result. * Traceability in chatbot testing must cover chunks, retrieval results, and queries, not just the final response, because without that full log, root-cause analysis of a wrong answer is nearly impossible. * The CHAT framework structures chatbot testing around four concerns: context retention, hallucination control, accuracy and relevance, and a testing workflow that includes tracing, fixing, and retesting with similar queries. * Stress testing for chatbots means checking responses to misspellings, ambiguous terms, and bad wording that frustrated users produce, not measuring system performance under load.
56 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de Software Testing Unleashed - QA, DevEx & Quality Engineering!