Inference Time Scaling for Enterprises

9 min · 16 jun 2025

Beschrijving

In Episode 3 of No Math AI, Red Hat CEO Matt Hicks and CTO Chris Wright join hosts Akash Srivastava and Isha Puri to explore what it really takes to scale large language model inference time scaling in production. From cost concerns and platform orchestration to the launch of llm-d, they break down the transition from static models to dynamic, reasoning-heavy applications and how open source collaboration is making scalable AI a reality for enterprise teams.

Reacties

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de No Math AI community!

Probeer gratis

Alle afleveringen

3 afleveringen

Inference Time Scaling for Enterprises

16 jun 20259 min

Generative Optimization

In this episode of No Math AI, we're joined by Dr. Faez Ahmed, a professor at MIT and leader of the Design Computation and Digital Engineering Lab. He works at the fascinating intersection of generative AI, optimization, and engineering design, where he's redefining how we create everything from bicycles to next-generation aerospace systems. Together, Isha, Akash, and Faez discuss the future of engineering work, harnessing "generative optimization" to automate engineering design, balancing the needs for precision and creativity, and more.

23 apr 202525 min

Why Inference-Time Scaling?

In our first episode of No Math AI, Akash and Isha are joined by guest research engineers, Shivchander Sudalairaj, GX Xu, and Kai Xu, to discuss a crucial topic that’s making waves in AI performance: inference-time scaling. Simple put, inference-time scaling is a cost-effective method for improving AI model performance. Discover how this technique enhances reasoning in smaller language models, powers agentic AI, and ensures higher accuracy in mission-critical applications where precision is key. The discussion covers how inference-time scaling boosts model performance and decision-making in AI systems. Our guests also highlight a groundbreaking research paper that unveils how a probabilistic approach to selecting the best answers in reasoning models can significantly enhance accuracy. Read the research paper: https://probabilistic-inference-scaling.github.io/ [https://probabilistic-inference-scaling.github.io/] Guests: * Shivchander Sudalairaj * GX Xu * Kai Xu

18 mrt 202523 min

Inference Time Scaling for Enterprises

Beschrijving

Reacties

Probeer 14 dagen gratis

Alle afleveringen