No Math AI

Inference Time Scaling for Enterprises

9 min · 16 jun 2025
aflevering Inference Time Scaling for Enterprises artwork

Beschrijving

In Episode 3 of No Math AI, Red Hat CEO Matt Hicks and CTO Chris Wright join hosts Akash Srivastava and Isha Puri to explore what it really takes to scale large language model inference time scaling in production. From cost concerns and platform orchestration to the launch of llm-d, they break down the transition from static models to dynamic, reasoning-heavy applications and how open source collaboration is making scalable AI a reality for enterprise teams.

Reacties

0

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de No Math AI community!

Probeer gratis

Probeer 14 dagen gratis

€ 9,99 / maand na proefperiode. · Elk moment opzegbaar.

  • Podcasts die je alleen op Podimo hoort
  • 20 uur luisterboeken / maand
  • Gratis podcasts

Alle afleveringen

3 afleveringen

aflevering Why Inference-Time Scaling? artwork

Why Inference-Time Scaling?

In our first episode of No Math AI, Akash and Isha are joined by guest research engineers, Shivchander Sudalairaj, GX Xu, and Kai Xu, to discuss a crucial topic that’s making waves in AI performance: inference-time scaling. Simple put, inference-time scaling is a cost-effective method for improving AI model performance. Discover how this technique enhances reasoning in smaller language models, powers agentic AI, and ensures higher accuracy in mission-critical applications where precision is key. The discussion covers how inference-time scaling boosts model performance and decision-making in AI systems. Our guests also highlight a groundbreaking research paper that unveils how a probabilistic approach to selecting the best answers in reasoning models can significantly enhance accuracy. Read the research paper: https://probabilistic-inference-scaling.github.io/ [https://probabilistic-inference-scaling.github.io/] Guests: * Shivchander Sudalairaj * GX Xu * Kai Xu

18 mrt 202523 min