The Digital Transformation Playbook

The Outcome Density Scorecard: Measuring AI Value Beyond Hours Saved

9 min · 12. maj 2026
episode The Outcome Density Scorecard: Measuring AI Value Beyond Hours Saved cover

Description

AI value is often overstated when organisations rely on hours saved, usage data, or self-reported productivity. This episode reframes AI measurement around outcome density, where value is proven through better workflows, stronger controls, and reduced organisational drag. It explores how leaders can judge AI by the quality and efficiency of completed outcomes. The key takeaway is that AI creates enterprise value when it improves controlled, repeatable outcomes with less friction and burden. TLDR / At a Glance • Hours saved is only a weak supporting signal • AI value depends on completed outcomes improving • More output can increase rework and risk • Review, governance, and workload costs matter • Workflow-level measures reveal real performance change • Leaders should scale AI where outcome density rises If your AI programme looks “successful” because prompts are up and hours saved are easy to quote, you might be optimising the wrong thing. We make the case that activity metrics are comforting but weak, because they don’t prove the business is delivering better outcomes, faster decisions, or stronger financial performance. We walk through why hours saved became the default, and why it often evaporates inside the working day through coordination, review, and scattered time. Then we introduce a sharper idea for enterprise AI ROI: outcome density. It asks a simple, demanding question: are we producing more valuable, controlled outcomes per unit of total organisational input, including review effort, management attention, exception handling, and risk capacity?  That shift exposes a common trap where AI increases output while quietly raising rework, escalations, and governance load. To make it practical, we break down an Outcome Density Scorecard built around six dimensions: flow, quality, economics, workload, risk and control, plus learning and capability. We also show how leaders should apply these measures at workflow level, from document work and customer support to software engineering, finance operations, and agentic workflows where traceability and supervisory intervention matter even more.  If you want AI measurement that stands up in the boardroom, this gives you a clearer dashboard and better decisions on what to scale, redesign, or stop. If this helped, subscribe for more on enterprise AI strategy, share the episode with a colleague who owns your AI metrics, and leave a review telling us which scorecard dimension your organisation struggles with most. Support the show [https://www.buymeacoffee.com/KGilmurray] 𝗖𝗼𝗻𝘁𝗮𝗰𝘁 my team and I to get business results, not excuses. ☎️ https://calendly.com/kierangilmurray/results-not-excuses ✉️ kieran@gilmurray.co.uk 🌍 www.KieranGilmurray.com 📘 Kieran Gilmurray | LinkedIn [https://www.linkedin.com/in/kierangilmurray/] 🦉 X / Twitter: https://twitter.com/KieranGilmurray 📽 YouTube: https://www.youtube.com/@KieranGilmurray 📕 Want to learn more about agentic AI then read my new book on Agentic AI and the Future of Work https://tinyurl.com/MyBooksOnAmazonUK [https://tinyurl.com/MyBooksOnAmazonUK]

Comments

0

Be the first to comment

Sign up now and become a member of the The Digital Transformation Playbook community!

Get Started

1 month for 9 kr.

Then 99 kr. / month · Cancel anytime.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

All episodes

252 episodes

episode Why Your AI Focus Group Keeps Saying Three artwork

Why Your AI Focus Group Keeps Saying Three

You spend years building a product, polish the packaging, nail the pitch… then you hit the terrifying question: is anyone actually going to buy it? We dig into a 2025 research result from PyMC Labs and Colgate-Palmolive that aims straight at that fear with AI market research, synthetic consumers, and large language models that can simulate purchase intent at scale. TL;DR / At A Glance * the core problem with direct Likert ratings and why LLMs collapse to neutral threes * how semantic similarity rating converts free-text responses into numerical scores using embeddings and cosine similarity * why follow-up AI grading helps but still trails the embedding-based approach * what 57 real product surveys and 9,300 human responses reveal about accuracy and distribution matching * how persona prompting reproduces real demographic patterns across age and income constraints * why zero-shot LLM methods can beat supervised machine learning models trained on the same domain The shocker is that the first attempt fails badly. When you make models like GPT-4 or Gemini answer a classic Likert scale with a single number, they hedge and pile up on neutral “3” ratings. The fix is not “better AI”, it is better questioning.  Google Notebook LM Agents help us unpack semantic similarity rating: let the model respond in natural language, convert that text into embeddings, and map it to five anchor statements using cosine similarity. You get fast, automated scoring without stripping away the model’s reasoning. From there, we pressure-test the method against thousands of real survey responses across dozens of personal care product concepts, then look at whether AI personas actually reflect real constraints like age and income.  We also compare the approach with traditional machine learning models such as LightGBM, and dig into an underrated advantage: synthetic consumers can produce richer, more candid qualitative feedback than many human panels. If you care about product testing, consumer insights, or the future of focus groups, listen through and tell us where you’d trust this and where you wouldn’t.  Subscribe, share with a colleague, and leave a review with your take: would you let synthetic consumers influence a real launch? Paper: http://arxiv.org/abs/2510.08338 [https://t.co/W5BlqI49ci] Support the show [https://www.buymeacoffee.com/KGilmurray] 𝗖𝗼𝗻𝘁𝗮𝗰𝘁 my team and I to get business results, not excuses. ☎️ https://calendly.com/kierangilmurray/results-not-excuses ✉️ kieran@gilmurray.co.uk 🌍 www.KieranGilmurray.com 📘 Kieran Gilmurray | LinkedIn [https://www.linkedin.com/in/kierangilmurray/] 🦉 X / Twitter: https://twitter.com/KieranGilmurray 📽 YouTube: https://www.youtube.com/@KieranGilmurray 📕 Want to learn more about agentic AI then read my new book on Agentic AI and the Future of Work https://tinyurl.com/MyBooksOnAmazonUK [https://tinyurl.com/MyBooksOnAmazonUK]

Yesterday22 min
episode AI-First Strategy at Scale: Pega's Roadmap with David Vidoni artwork

AI-First Strategy at Scale: Pega's Roadmap with David Vidoni

Token subsidies are fading, AI prices are rising, and suddenly the fun part of experimentation comes with a nasty surprise: runaway spend. We dig into what that shift means for CIOs and IT leaders who still need to ship results, protect budgets, and prove ROI.  If you have spent time counting tokens or worrying that one enthusiastic pilot will burn through a month’s AI budget, this conversation is for you. David Vidoni, CIO at Pega, shares why predictable cost matters as much as model capability and how “charging for outcomes” changes the way you govern AI.  We talk about the practical tension between creativity and cost control, and why leaders should pause and ask whether AI is genuinely the best tool for a given challenge.  The goal is not to slow innovation down, but to stop wasting energy on spend anxiety and refocus on measurable business value. We also get concrete on delivery: how Blueprint supports a design-first approach that clarifies what you are building before you build it, reduces costly mistakes, and speeds up time to first release.  You will hear real internal stats, plus what it takes to deliver secure, compliant, repeatable outcomes rather than variable answers.  Finally, we explore agentic AI wins in legal and contract work, including significant hours saved and major ticket deflection. Listen, then subscribe, share with a fellow CIO or product leader, and leave a review with your biggest AI cost or governance challenge. Support the show [https://www.buymeacoffee.com/KGilmurray] 𝗖𝗼𝗻𝘁𝗮𝗰𝘁 my team and I to get business results, not excuses. ☎️ https://calendly.com/kierangilmurray/results-not-excuses ✉️ kieran@gilmurray.co.uk 🌍 www.KieranGilmurray.com 📘 Kieran Gilmurray | LinkedIn [https://www.linkedin.com/in/kierangilmurray/] 🦉 X / Twitter: https://twitter.com/KieranGilmurray 📽 YouTube: https://www.youtube.com/@KieranGilmurray 📕 Want to learn more about agentic AI then read my new book on Agentic AI and the Future of Work https://tinyurl.com/MyBooksOnAmazonUK [https://tinyurl.com/MyBooksOnAmazonUK]

15. juni 20266 min
episode Enterprise AI Will Not Scale Until You Redesign Work artwork

Enterprise AI Will Not Scale Until You Redesign Work

Your AI can write a tidy email summary, but that is not the job. The real leap is from passive text generation to agentic AI that can read context, plan a sequence of steps, use tools through APIs, and execute actions inside real enterprise systems. That leap is thrilling, and it is also where most organisations hit the wall: plenty of pilots, very little production impact, and a growing fear of what happens when an autonomous agent is allowed anywhere near procurement, customer data, or payments. TL;DR: * why AI investment keeps rising while production success stays low  * the scaling wall: latency, compute cost, fragile error handling, messy data  * the trust gap when autonomous agents can touch procurement, payments, and live systems  * process inertia and the trap of paving the cow path  * pragmatic AI mindset: hyper-specialised utility over sci-fi general intelligence  * six pillars of agentic AI: tool use, action, memory, perception, planning, orchestration  * multi-agent systems as modular digital specialists that isolate risk and raise accuracy  We use Google Notebook LM Agents to take insights from a Deloitte AI Institute report produced with Google Cloud to unpack why scaling enterprise AI is so hard and what actually changes when you build goal-oriented agents.  Google Notebook LM Agents break down the practical architecture behind autonomous digital workers, including memory and reflection, multimodal perception, and planning that turns an ambiguous goal into an executable workflow. They also dig into multi-agent systems, where specialised agents work like a kitchen brigade rather than one giant generalist model, and why that modularity improves accuracy while reducing the blast radius when something fails. Autonomy without governance is just risk at speed, so we get specific about controls: an agent OS hub-and-spoke model for visibility, FinOps guardrails and kill switches to stop runaway compute spend, and a defence-in-depth approach to security. That includes linguistic guardrails against prompt injection, sandboxing, semantic checks with constitutional AI auditing before actions execute, and infrastructure-level threat hunting. We also cover IDAMA, identity and access management for agents, so permissions stay least-privilege and accountability stays human-owned. Finally, we bring it back to reality: change management, process redesign, and data gravity. You will hear concrete case studies in accounts payable automation and an agentic knowledge assistant with citations, plus why Apache Iceberg and cross-cloud lakehouse patterns matter for querying data where it lives. Subscribe, share, and leave a review if this helped, and tell us what task you would trust an agent to run first. Support the show [https://www.buymeacoffee.com/KGilmurray] 𝗖𝗼𝗻𝘁𝗮𝗰𝘁 my team and I to get business results, not excuses. ☎️ https://calendly.com/kierangilmurray/results-not-excuses ✉️ kieran@gilmurray.co.uk 🌍 www.KieranGilmurray.com 📘 Kieran Gilmurray | LinkedIn [https://www.linkedin.com/in/kierangilmurray/] 🦉 X / Twitter: https://twitter.com/KieranGilmurray 📽 YouTube: https://www.youtube.com/@KieranGilmurray 📕 Want to learn more about agentic AI then read my new book on Agentic AI and the Future of Work https://tinyurl.com/MyBooksOnAmazonUK [https://tinyurl.com/MyBooksOnAmazonUK]

14. juni 202623 min
episode Kieran Gilmurray x Matt Healy: The Reality of Agentic AI artwork

Kieran Gilmurray x Matt Healy: The Reality of Agentic AI

AI is moving fast, but enterprise leaders are starting to ask a sharper question: are we getting value for the money we’re spending? Matt Healy from Pega joins us to unpack what “agentic transformation” looks like when it has to survive real-world constraints like compliance, security, and customer-facing reliability, not just a slick prototype. TL;DR: * extending AI-driven development into the platform with coding agents such as GitHub Copilot, Codex, and Cloud Code * deploying agents that run predictably against rules, regulations, and compliance needs * shifting from token-based consumption to outcome-based agentic pricing for predictable ROI * why vendor pricing changes can flip an AI use case from profit to loss * using AI to analyse legacy systems, translate code into natural language, and guide modernisation * combining AWS legacy analysis with Blueprint to support mainframe exit and reimagined journeys * building enterprise-ready apps that are explainable, secure, scalable, and consistently developed We talk about AI-driven development and the growing role of coding agents in everyday work, including tools such as GitHub Copilot, Codex, and Cloud Code. Speed is great, but Matt explains why it can also create apps that aren’t explainable, hide vulnerabilities, and struggle to scale. The goal is to keep the acceleration while making the output enterprise-ready: transparent, deployable at massive scale, compliant, secure, and built consistently. Cost control is the other make-or-break topic. Token-based pricing sounds simple until reasoning agents start consuming unpredictably and vendors change their models. Matt lays out an outcome-based approach to agentic pricing that focuses on work done and value delivered, aiming for predictable costs and predictable ROI so promising AI use cases don’t suddenly turn unprofitable. We also dig into Pega Blueprint’s progress on legacy modernisation, including how AWS-powered analysis of legacy languages like COBOL can produce natural language understanding that feeds transformation work. If you care about mainframe exit, cloud modernisation, and reimagining customer journeys rather than lift-and-shift, you’ll find plenty to take away.  If you found this useful, subscribe, share it with a colleague, and leave a review so more builders and leaders can find the show. #PegaPartner Support the show [https://www.buymeacoffee.com/KGilmurray] 𝗖𝗼𝗻𝘁𝗮𝗰𝘁 my team and I to get business results, not excuses. ☎️ https://calendly.com/kierangilmurray/results-not-excuses ✉️ kieran@gilmurray.co.uk 🌍 www.KieranGilmurray.com 📘 Kieran Gilmurray | LinkedIn [https://www.linkedin.com/in/kierangilmurray/] 🦉 X / Twitter: https://twitter.com/KieranGilmurray 📽 YouTube: https://www.youtube.com/@KieranGilmurray 📕 Want to learn more about agentic AI then read my new book on Agentic AI and the Future of Work https://tinyurl.com/MyBooksOnAmazonUK [https://tinyurl.com/MyBooksOnAmazonUK]

11. juni 20264 min
episode Behind the Scenes at PegaWorld: A Conversation with Kara Manton artwork

Behind the Scenes at PegaWorld: A Conversation with Kara Manton

Legacy systems do not fail because teams lack ambition. They fail because nobody has the time to untangle years of code, edge cases and hidden business logic. We sit down with Kara Manton, business director in Pega’s product engineering function, to unpack the biggest PegaWorld announcements aimed at changing that reality, starting with why Pega Infinity 26 is being called one of the best releases in a decade.  TL;DR: * Infinity 26 as a major step forward for AI powered workflow automation * Blueprint AI inside Infinity Studio and an AI assistant that builds rules behind the scenes * Calling Pega workflows from different AI tools while keeping execution predictable * AWS Transform plus Blueprint to modernise legacy code into production apps in three months * Designing business rules and user experience earlier to cut rework later * No token charging and a shift towards outcomes based pricing We talk through what it looks like when AI is designed to strengthen workflow automation rather than replace it. Kara explains how Pega Blueprint has evolved from an early idea into a deeper application design experience where you can shape process flows, business rules and user experience before you build.  We also dig into Infinity Studio with its built-in AI assistant, where you can chat and have the system generate Pega rules behind the scenes, opening the door for more people to participate in creating workflow applications.  The conversation turns to two big enterprise concerns: modernisation speed and AI cost. Kara highlights the on-stage AWS Transform announcement, describing how AWS Transform plus the power of Blueprint can take organisations from a legacy code base to a production app in three months.  We also cover Pega’s decision not to charge for tokens, focusing instead on outcomes and predictable cost in a world where tokenomics and model changes can feel chaotic. If you care about practical, governed AI, agentic workflows and faster legacy transformation, this one is for you.  Subscribe, share with your team, and leave a review with the workflow problem you want to modernise next. #PegaPartner Support the show [https://www.buymeacoffee.com/KGilmurray] 𝗖𝗼𝗻𝘁𝗮𝗰𝘁 my team and I to get business results, not excuses. ☎️ https://calendly.com/kierangilmurray/results-not-excuses ✉️ kieran@gilmurray.co.uk 🌍 www.KieranGilmurray.com 📘 Kieran Gilmurray | LinkedIn [https://www.linkedin.com/in/kierangilmurray/] 🦉 X / Twitter: https://twitter.com/KieranGilmurray 📽 YouTube: https://www.youtube.com/@KieranGilmurray 📕 Want to learn more about agentic AI then read my new book on Agentic AI and the Future of Work https://tinyurl.com/MyBooksOnAmazonUK [https://tinyurl.com/MyBooksOnAmazonUK]

11. juni 20265 min