Inference Time Scaling for Enterprises

9 min · 16 de jun de 2025

Descripción

In Episode 3 of No Math AI, Red Hat CEO Matt Hicks and CTO Chris Wright join hosts Akash Srivastava and Isha Puri to explore what it really takes to scale large language model inference time scaling in production. From cost concerns and platform orchestration to the launch of llm-d, they break down the transition from static models to dynamic, reasoning-heavy applications and how open source collaboration is making scalable AI a reality for enterprise teams.

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y forma parte de la comunidad de No Math AI!

Prueba gratis

Todos los episodios

3 episodios

Inference Time Scaling for Enterprises

16 de jun de 20259 min

Generative Optimization

In this episode of No Math AI, we're joined by Dr. Faez Ahmed, a professor at MIT and leader of the Design Computation and Digital Engineering Lab. He works at the fascinating intersection of generative AI, optimization, and engineering design, where he's redefining how we create everything from bicycles to next-generation aerospace systems. Together, Isha, Akash, and Faez discuss the future of engineering work, harnessing "generative optimization" to automate engineering design, balancing the needs for precision and creativity, and more.

23 de abr de 202525 min

Why Inference-Time Scaling?

In our first episode of No Math AI, Akash and Isha are joined by guest research engineers, Shivchander Sudalairaj, GX Xu, and Kai Xu, to discuss a crucial topic that’s making waves in AI performance: inference-time scaling. Simple put, inference-time scaling is a cost-effective method for improving AI model performance. Discover how this technique enhances reasoning in smaller language models, powers agentic AI, and ensures higher accuracy in mission-critical applications where precision is key. The discussion covers how inference-time scaling boosts model performance and decision-making in AI systems. Our guests also highlight a groundbreaking research paper that unveils how a probabilistic approach to selecting the best answers in reasoning models can significantly enhance accuracy. Read the research paper: https://probabilistic-inference-scaling.github.io/ [https://probabilistic-inference-scaling.github.io/] Guests: * Shivchander Sudalairaj * GX Xu * Kai Xu

18 de mar de 202523 min

Inference Time Scaling for Enterprises

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios