No Math AI

Inference Time Scaling for Enterprises

9 min · 16. juni 2025
episode Inference Time Scaling for Enterprises cover

Beskrivelse

In Episode 3 of No Math AI, Red Hat CEO Matt Hicks and CTO Chris Wright join hosts Akash Srivastava and Isha Puri to explore what it really takes to scale large language model inference time scaling in production. From cost concerns and platform orchestration to the launch of llm-d, they break down the transition from static models to dynamic, reasoning-heavy applications and how open source collaboration is making scalable AI a reality for enterprise teams.

Kommentarer

0

Vær den første til at kommentere

Tilmeld dig nu og bliv en del af No Math AI-fællesskabet!

Kom i gang

2 måneder kun 19 kr.

Derefter 99 kr. / måned · Opsig når som helst.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

Alle episoder

3 episoder

episode Why Inference-Time Scaling? cover

Why Inference-Time Scaling?

In our first episode of No Math AI, Akash and Isha are joined by guest research engineers, Shivchander Sudalairaj, GX Xu, and Kai Xu, to discuss a crucial topic that’s making waves in AI performance: inference-time scaling. Simple put, inference-time scaling is a cost-effective method for improving AI model performance. Discover how this technique enhances reasoning in smaller language models, powers agentic AI, and ensures higher accuracy in mission-critical applications where precision is key. The discussion covers how inference-time scaling boosts model performance and decision-making in AI systems. Our guests also highlight a groundbreaking research paper that unveils how a probabilistic approach to selecting the best answers in reasoning models can significantly enhance accuracy. Read the research paper: https://probabilistic-inference-scaling.github.io/ [https://probabilistic-inference-scaling.github.io/] Guests: * Shivchander Sudalairaj * GX Xu * Kai Xu

18. mar. 202523 min