Inference Time Scaling for Enterprises

9 min · 16. juni 2025

Beskrivelse

In Episode 3 of No Math AI, Red Hat CEO Matt Hicks and CTO Chris Wright join hosts Akash Srivastava and Isha Puri to explore what it really takes to scale large language model inference time scaling in production. From cost concerns and platform orchestration to the launch of llm-d, they break down the transition from static models to dynamic, reasoning-heavy applications and how open source collaboration is making scalable AI a reality for enterprise teams.

Kommentarer

Vær den første til at kommentere

Tilmeld dig nu og bliv en del af No Math AI-fællesskabet!

Kom i gang

Alle episoder

3 episoder

Inference Time Scaling for Enterprises

16. juni 20259 min

Generative Optimization

In this episode of No Math AI, we're joined by Dr. Faez Ahmed, a professor at MIT and leader of the Design Computation and Digital Engineering Lab. He works at the fascinating intersection of generative AI, optimization, and engineering design, where he's redefining how we create everything from bicycles to next-generation aerospace systems. Together, Isha, Akash, and Faez discuss the future of engineering work, harnessing "generative optimization" to automate engineering design, balancing the needs for precision and creativity, and more.

23. apr. 202525 min

Why Inference-Time Scaling?

In our first episode of No Math AI, Akash and Isha are joined by guest research engineers, Shivchander Sudalairaj, GX Xu, and Kai Xu, to discuss a crucial topic that’s making waves in AI performance: inference-time scaling. Simple put, inference-time scaling is a cost-effective method for improving AI model performance. Discover how this technique enhances reasoning in smaller language models, powers agentic AI, and ensures higher accuracy in mission-critical applications where precision is key. The discussion covers how inference-time scaling boosts model performance and decision-making in AI systems. Our guests also highlight a groundbreaking research paper that unveils how a probabilistic approach to selecting the best answers in reasoning models can significantly enhance accuracy. Read the research paper: https://probabilistic-inference-scaling.github.io/ [https://probabilistic-inference-scaling.github.io/] Guests: * Shivchander Sudalairaj * GX Xu * Kai Xu

18. mar. 202523 min

Inference Time Scaling for Enterprises

Beskrivelse

Kommentarer

2 måneder kun 19 kr.

Alle episoder