Imagen de portada del espectáculo Inference Time Tactics

Inference Time Tactics

Podcast de NeuroMetric AI

inglés

Tecnología y ciencia

Oferta limitada

2 meses por 1 €

Después 4,99 € / mesCancela cuando quieras.

  • 20 horas de audiolibros / mes
  • Podcasts exclusivos
  • Podcast gratuitos
Empezar

Acerca de Inference Time Tactics

A podcast exploring the emerging field of inference-time compute—the next frontier in AI performance. Hosted by the Neurometric team, we unpack how models reason, make decisions, and perform at runtime. For developers, researchers, and operators building AI infrastructure.

Todos los episodios

15 episodios

Portada del episodio Automating the SLM Development Loop: Inside the Pioneer Agent Paper with Yash Sharma of Neurometric AI

Automating the SLM Development Loop: Inside the Pioneer Agent Paper with Yash Sharma of Neurometric AI

In this episode of Inference Time Tactics, Cooper sits down with Yash Sharma, Head of AI Research at Neurometric AI, to break down the Pioneer Agent paper from Fastino Labs—a system that uses Claude Sonnet as an ML engineer in a box to build and improve small language models end-to-end. From cold start data curation to production failure diagnosis, the paper argues the real bottleneck in SLM creation isn't training—it's everything around it.   We talked about:   * What Pioneer Agent actually is and why Fastino built a 25-page system paper instead of just a training paper.  * How Neurometric is already tackling the same problems—and where the paper maps onto our approach.  * Why naive retraining on production data quietly degrades your model—and how an agentic loop fixes it.  * What the benchmarks reveal about where SLMs work out of the box versus where they need intervention.  * Where Pioneer Agent hits its limits—and how Neurometric is pushing SLMs toward harder agentic tasks. Connect with Neurometric: Website: https://www.neurometric.ai/ [https://www.neurometric.ai/]  Substack: https://neurometric.substack.com/ [https://neurometric.substack.com/]  X: https://x.com/neurometric/ [https://x.com/neurometric/]  Bluesky: https://bsky.app/profile/neurometric.bsky.social [https://bsky.app/profile/neurometric.bsky.social]   Host/s: Calvin Cooper https://x.com/cooper_nyc_ [https://x.com/cooper_nyc_]  https://www.linkedin.com/in/coopernyc [https://www.linkedin.com/in/coopernyc]   Guest/s: Yash Sharma https://x.com/yash_j_sharma [https://x.com/yash_j_sharma]  https://www.linkedin.com/in/yashjsharma [https://www.linkedin.com/in/yashjsharma]

16 de jun de 2026 - 32 min
Portada del episodio SLMs Beat GPT-4o: How AI Is Being Used in Hiring with Fletcher Wimbush, CEO of Discovered AI

SLMs Beat GPT-4o: How AI Is Being Used in Hiring with Fletcher Wimbush, CEO of Discovered AI

In this episode of Inference Time Tactics, Cooper sits down with Fletcher Wimbush, CEO of Discovered AI, to get into what’s actually broken about hiring—and how 13 years of bootstrapped recruiting, 10,000 interviews, and four peer-reviewed journal articles became the foundation for a platform now outperforming its biggest competitors. With hundreds of applicants flooding every open role and general AI tools hitting the ceiling at 70-80% resume screening accuracy, Discovered is combining behavioral science with fine-tuned small language models to get hiring right the first time.   We talked about:   * The hidden cost of over-relying on qualifications over attitude and integrity—and why Warren Buffett says a smart person with low integrity is your biggest threat. * How 10,000 interviews and four peer-reviewed journal articles with Bowling Green State University gave Discovered’s hiring system its scientific backbone. * Fine-tuned small language models hitting 98.5% accuracy on resume screening where GPT-4o topped out at 70-80%—at a fraction of the cost. * What’s coming next on the Discovered roadmap: smarter resume database search, AI-powered parsing, and an AI interviewer built to ask the right follow-up questions.   Connect with Discovered AI: Website: https://discovered.ai [https://discovered.ai]  LinkedIn: https://www.linkedin.com/in/fletcher-wimbush [https://www.linkedin.com/in/fletcher-wimbush]  Email: fletcher@discovered.ai [fletcher@discovered.ai]   Connect with Neurometric: Website: https://www.neurometric.ai/ [https://www.neurometric.ai/]  Substack: https://neurometric.substack.com/ [https://neurometric.substack.com/]  X: https://x.com/neurometric/ [https://x.com/neurometric/]  Bluesky: https://bsky.app/profile/neurometric.bsky.social [https://bsky.app/profile/neurometric.bsky.social]   Hosts: Calvin Cooper https://x.com/cooper_nyc_ [https://x.com/cooper_nyc_]  https://www.linkedin.com/in/coopernyc [https://www.linkedin.com/in/coopernyc]

29 de may de 2026 - 28 min
Portada del episodio Voice Intelligence at Scale: From Call of Duty to Fraud Detection with Modulate AI

Voice Intelligence at Scale: From Call of Duty to Fraud Detection with Modulate AI

Every day billions of voice conversations happen across games, customer service calls, and financial transactions. Almost none of them are understood by machines. In this episode of Inference Time Tactics, Calvin Cooper and Yash Sharma sit down with Carter Huffman, CTO and co founder of Modulate, to explore the AI systems that can finally understand voice conversations in real time.   Modulate’s model Velma 2.0 powers voice intelligence across industries. From moderating voice chat in games like Call of Duty to detecting fraud in financial calls and analyzing customer support conversations, their system uses ensembles of specialized models to capture tone, intent, emotion, and conversational dynamics. Instead of relying on giant foundation models, Velma orchestrates over 100 specialized models to deliver higher accuracy at dramatically lower cost.   We talked about:   * The challenge of processing a trillion hours of annual global voice traffic. * Scaling real-time moderation for massive platforms like Call of Duty. * Capturing nuance, tone, and sarcasm beyond basic text transcripts. * Ensemble architecture utilizing over 100 specialized models. * Orchestration layers that trim compute costs by identifying optimal model subsets. * Achieving order-of-magnitude cost savings compared to large foundational models. * Applying "exploration vs. exploitation" optimization to shifting conversation data. * Future development of "context graphs" to map participant intent and causality. Resources Mentioned: NeuroMetric Audio Leaderboard: https://leaderboard.neurometric.ai/?leaderboard=audio [https://leaderboard.neurometric.ai/?leaderboard=audio]  Connect with Modulate: Website: https://www.modulate.ai/ [https://www.modulate.ai/]  LinkedIn: https://www.linkedin.com/in/carter-huffman-a9aba05b [https://www.linkedin.com/in/carter-huffman-a9aba05b]  Velma: https://www.modulate.ai/velma [https://www.modulate.ai/velma]  Connect with Neurometric: Website: https://www.neurometric.ai/ [https://www.neurometric.ai/]  Substack: https://neurometric.substack.com/ [https://neurometric.substack.com/]  X: https://x.com/neurometric/ [https://x.com/neurometric/]  Bluesky: https://bsky.app/profile/neurometric.bsky.social [https://bsky.app/profile/neurometric.bsky.social] Hosts: Calvin Cooper https://x.com/cooper_nyc_ [https://x.com/cooper_nyc_]  https://www.linkedin.com/in/coopernyc [https://www.linkedin.com/in/coopernyc]   Yash Sharma https://x.com/yash_j_sharma [https://x.com/yash_j_sharma]  https://www.linkedin.com/in/yashjsharma/ [https://www.linkedin.com/in/yashjsharma/]

9 de mar de 2026 - 32 min
Portada del episodio From GPU Scarcity to GPU Waste: Solving the Utilization Crisis

From GPU Scarcity to GPU Waste: Solving the Utilization Crisis

In this episode of Inference Time Tactics, Cooper and Byron sit down with Charlie and Anil from Rapt AI to tackle one of the industry's most expensive problems: GPU underutilization. With half a trillion dollars invested in GPU infrastructure running at just 20-30% utilization, Rapt AI is building AI-powered orchestration that automatically analyzes workloads and matches them to the right compute resources—no guesswork required.   We talked about:   * Why half a trillion dollars in GPU infrastructure runs at only 20-30% utilization—and how a 5% drop costs $200,000 per $2M investment.  * How Rapt AI's platform continuously analyzes workloads and auto-optimizes GPU allocation, letting customers run 4-14 models per GPU.  * Real results: moving workloads from H100s to A100s at 40% of the cost, and reducing GPU footprints from 184 to under 50 while improving performance.  * Why 2026 becomes the year of inference as agentic workloads create unprecedented infrastructure chaos.  * The shift from supply problems to optimization problems—and why abstraction layers matter across multi-vendor environments.  * Power as the next crisis: tokens-per-watt emerging as the critical metric alongside tokens-per-dollar.  * How intelligent orchestration frees up data scientists and ML ops teams from infrastructure tuning to focus on AI innovation. Connect with Rapt AI: Website: https://www.rapt.ai/ [https://www.rapt.ai/]  LinkedIn (Anil Ravindranath): https://www.linkedin.com/in/anilravindranath [https://www.linkedin.com/in/anilravindranath]  LinkedIn (Charlie Leeming): https://www.linkedin.com/in/charlieleeming/ [https://www.linkedin.com/in/charlieleeming/]  Connect with Neurometric: Website: https://www.neurometric.ai/ [https://www.neurometric.ai/]  Substack: https://neurometric.substack.com/ [https://neurometric.substack.com/]  X: https://x.com/neurometric/ [https://x.com/neurometric/]  Bluesky: https://bsky.app/profile/neurometric.bsky.social [https://bsky.app/profile/neurometric.bsky.social] Hosts: Calvin Cooper https://x.com/cooper_nyc_ [https://x.com/cooper_nyc_]  https://www.linkedin.com/in/coopernyc [https://www.linkedin.com/in/coopernyc]   Byron Galbraith https://x.com/bgalbraith [https://x.com/bgalbraith]  https://www.linkedin.com/in/byrongalbraith [https://www.linkedin.com/in/byrongalbraith]

16 de ene de 2026 - 40 min
Portada del episodio Lessons from the Leading Edge: What 420 AI Deployments Reveal About Enterprise Success

Lessons from the Leading Edge: What 420 AI Deployments Reveal About Enterprise Success

In this episode of Inference Time Tactics, Rob, Cooper, and Byron sit down with Shawn Rogers, CEO of BARC US to unpack fresh data from 421 organizations actively deploying AI in production. Shawn shares what separates the 20% of AI leaders from everyone else, why cost surprises are hitting harder than expected, and how the pressure to "just do AI" is causing companies to skip critical foundations—often to their detriment. We talked about:   * Why multi-model strategies and small language models are becoming essential for enterprise AI. * The seven foundational areas that help AI leaders deploy twice as many projects as everyone else.  * Why 51% of deployments face unexpected cost overruns—and which expenses hit hardest.  * Data quality jumping to the #1 challenge, affecting 44% of production deployments.  * The IT satisfaction paradox: top resource at the start, lowest satisfaction scores at scale.  * How responsible AI priorities shifted as human-in-the-loop dropped from 36% to 21%.  Resources Mentioned: Lessons from the Leading Edge: Successful Delivery of AI/GenAI https://barc.com/research/successful-ai-genai-delivery/ Connect with BARC: Website: https://barc.com/ [https://barc.com/]  LinkedIn (Shawn Rogers): https://www.linkedin.com/in/shawnrogers/ [https://www.linkedin.com/in/shawnrogers/]  Connect with Neurometric: Website: https://www.neurometric.ai/ [https://www.neurometric.ai/]  Substack: https://neurometric.substack.com/ [https://neurometric.substack.com/]  X: https://x.com/neurometric/ [https://x.com/neurometric/]  Bluesky: https://bsky.app/profile/neurometric.bsky.social [https://bsky.app/profile/neurometric.bsky.social] Hosts: Rob May https://x.com/robmay [https://x.com/robmay]  https://www.linkedin.com/in/robmay [https://www.linkedin.com/in/robmay]   Calvin Cooper https://x.com/cooper_nyc_ [https://x.com/cooper_nyc_]  https://www.linkedin.com/in/coopernyc [https://www.linkedin.com/in/coopernyc]   Byron Galbraith https://x.com/bgalbraith [https://x.com/bgalbraith]  https://www.linkedin.com/in/byrongalbraith [https://www.linkedin.com/in/byrongalbraith]

22 de dic de 2025 - 44 min
Soy muy de podcasts. Mientras hago la cama, mientras recojo la casa, mientras trabajo… Y en Podimo encuentro podcast que me encantan. De emprendimiento, de salid, de humor… De lo que quiera! Estoy encantada 👍
Soy muy de podcasts. Mientras hago la cama, mientras recojo la casa, mientras trabajo… Y en Podimo encuentro podcast que me encantan. De emprendimiento, de salid, de humor… De lo que quiera! Estoy encantada 👍
MI TOC es feliz, que maravilla. Ordenador, limpio, sugerencias de categorías nuevas a explorar!!!
Me suscribi con los 14 días de prueba para escuchar el Podcast de Misterios Cotidianos, pero al final me quedo mas tiempo porque hacia tiempo que no me reía tanto. Tiene Podcast muy buenos y la aplicación funciona bien.
App ligera, eficiente, encuentras rápido tus podcast favoritos. Diseño sencillo y bonito. me gustó.
contenidos frescos e inteligentes
La App va francamente bien y el precio me parece muy justo para pagar a gente que nos da horas y horas de contenido. Espero poder seguir usándola asiduamente.

Elige tu suscripción

Más populares

Oferta limitada

Premium

20 horas de audiolibros

  • Podcasts exclusivos

  • Disfruta los podcast de Podimo sin anuncios

  • Cancela cuando quieras

2 meses por 1 €
Después 4,99 € / mes

Empezar

Premium Plus

100 horas de audiolibros

  • Podcasts exclusivos

  • Disfruta los podcast de Podimo sin anuncios

  • Cancela cuando quieras

Disfruta 30 días gratis
Después 9,99 € / mes

Prueba gratis

Sólo en Podimo

Audiolibros populares

Preguntas frecuentes

Más preguntas y respuestas
Empezar

2 meses por 1 €. Después 4,99 € / mes. Cancela cuando quieras.