No Effing AIdea!
Hosts: Srini Annamaraju [https://www.linkedin.com/in/sriniuk/] & David Royle [https://www.linkedin.com/in/davidroyle/] “Evals are the weak link in enterprise AI adoption.” And we say it like it is in our Maven cohort Lightning Lesson. Enrol here or see the recording - or join the waitlist for the paid 4-part course (tba): https://shorturl.at/lA9ig [https://shorturl.at/lA9ig] This episode is a proper grilling on AI Evals: what they are, why boards should care, and why “ship it now, eval it later” is how you end up with a quiet disaster. We also do a quick sweep on vendors going more “enterprise-native” (less benchmark theatre, more workflow reality). What we cover * Enterprise AI news: vendors shifting from benchmarks to enterprise workflows * OpenAI’s Enterprise report highlights * UiPath as the “plug-in hybrid” of automation: deterministic RPA meets GenAI via connectors (and why that blend might win) * What evals actually are: accuracy, citations, groundedness, hallucinations * Vendor reality: some push AI first and worry about evals later, others oversell eval tooling. Error analysis still matters * Evals as the connective tissue between value, risk, and operations. Proactive, not post-mortem-after-the-horses-bolted * The EDSO “four hats” operating model (Echo, Delta, Sigma, Omega) and why boards need the Omega translation layer * Maturity and scaling: small firms can fuse hats, even one-person pods for bounded scopes * Agentic future: “checker agents”, Delta agents writing eval harnesses, humans steering fleets of agents * Why SMEs lag, and how eval expectations will percolate through supply chains Chapters * 00:02 Intro: Episode 7, cold UK afternoon, messy middle of enterprise AI * 00:56 AI news: enterprise context is the new battleground * 02:45 OpenAI Enterprise report headlines * 10:16 UiPath, hybrid automation, and the “plug-in hybrid” analogy * 12:53 The grilling starts: what are evals? * 17:02 Is AI risk being exaggerated to sell governance tools? * 19:45 Evals as connective tissue, and why proactive matters * 21:55 The EDSO roles and what “good” looks like * 25:21 Maturity levels and how smaller firms scope it * 26:58 Checker agents and agentic operating models * 28:58 Business case problem: cost vs avoided disaster * 32:14 Evals in SMEs and supply-chain pressure * 33:26 Close: “survived the grilling” Takeaways * Evals are not paperwork. They’re how you keep the value chain connected to operations without risk blowing up later. * Don’t let vendors sell you “tooling-as-a-substitute-for-thinking.” You still need human error analysis and clear accountability. * Treat EDSO as hats, not headcount. Start bounded, prove value, then scale. * Evals is becoming a career lane (think “AI eval controller” the way finance has controllers). * The agentic world will add “checker agents” and automated harness-writing, but humans still steer the system. Who it’s for CIOs, CDOs, CAIOs, Heads of Risk, and anyone trying to ship enterprise AI without quietly lighting their control environment on fire. Also, anyone building a real career edge around AI trust and operational quality.
8 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de No Effing AIdea!!