Inference Time Tactics
In this episode of Inference Time Tactics, Cooper and Byron break down NeuroMetric's Thinking Algorithm Leaderboard and what it reveals about building production-ready AI agents. They share why prompt engineering with a single model won't cut it for enterprise use cases, explore the impact of inference-time compute strategies, and discuss what they learned from testing 10 models across real CRM tasks—from surprising token inefficiency to catastrophic failures in SQL generation. We talked about: * Why NeuroMetric built the first leaderboard combining models with inference-time compute strategies. * How Salesforce's CRMArena-Pro reflects real multi-step business tasks better than pure reasoning benchmarks. * The jagged frontier: no single model or technique dominates across all tasks. * Why GPT 20B was surprisingly token inefficient—twice as slow as GPT 120B for similar accuracy. * How GPT-5 nano's conversational style broke SQL generation tasks completely. * Trading accuracy for speed: two-model ensembles versus five, and saving 20+ seconds per task. * Throughput constraints as a hidden bottleneck when scaling to production volumes. * Future directions: LLM-guided search, task clustering, and compression to specialized small models. Resources Mentioned: CRMArena-Pro from Saleforce: https://www.salesforce.com/blog/crmarena-pro/ [https://www.salesforce.com/blog/crmarena-pro/] Thinking Algorithm Leaderboard: https://leaderboard.neurometric.ai/ [https://leaderboard.neurometric.ai/] Connect with Neurometric: Website: https://www.neurometric.ai/ [https://www.neurometric.ai/] Substack: https://neurometric.substack.com/ [https://neurometric.substack.com/] X: https://x.com/neurometric/ [https://x.com/neurometric/] Bluesky: https://bsky.app/profile/neurometric.bsky.social [https://bsky.app/profile/neurometric.bsky.social] Hosts: Calvin Cooper https://x.com/cooper_nyc_ [https://x.com/cooper_nyc_] https://www.linkedin.com/in/coopernyc [https://www.linkedin.com/in/coopernyc] Guest/s: Byron Galbraith https://x.com/bgalbraith [https://x.com/bgalbraith] https://www.linkedin.com/in/byrongalbraith [https://www.linkedin.com/in/byrongalbraith]
14 episoder
Kommentarer
0Vær den første til at kommentere
Tilmeld dig nu og bliv en del af Inference Time Tactics-fællesskabet!