The Manny Bernabe Show
In this episode, we dive into NVIDIA's bold push to make Python a first-class citizen in its GPU ecosystem—what guest Charles Frye calls "The Year of CUDA Python." Charles, a developer advocate at Modal, recaps key takeaways from the 2025 NVIDIA GTC conference, spotlighting the growing centrality of Python across CUDA tooling, including the debut of Python-first libraries like cuTile and a fully reworked Python interface for CUTLASS. We explore why NVIDIA is embracing Python for performance-critical development, how they’re addressing the challenges of Tensor Core programming, and what this all means for AI builders. Charles also breaks down NVIDIA’s hardware strategy shift—favoring scale-up over scale-out—and covers powerful new profiling tools like NSight Systems and Torch Profiler. Plus, we look at distributed inference innovations like Dynamo and how they intersect with platforms like Modal. Whether you're GPU-curious or deep into LLM infrastructure, this conversation offers insight into how NVIDIA’s ecosystem is evolving—and why Python is at the center of it all. Connect with Charles Frye 🧠 X (Twitter): @charles_irl [https://x.com/charles_irl] 💻 Try Modal: https://modal.com [https://modal.com] [CHAPTERS] 00:00 Start 01:35 The Year of CUDA Python 01:56 NVIDIA's Software Stack Evolution 03:01 Python's Growing Role in GPU Programming 06:11 CUTLASS and Python Integration 08:30 Tensor Cores and CUDA Complexity 12:02 Scaling Up vs. Scaling Out 18:01 AI Factory Concept 20:42 Hopper GPUs and New Generations 23:44 Memory-Bound Challenges in GPU Scaling 24:49 Performance and Tooling Insights 28:36 GPU Debugging Tools: Torch Profiler and NSight Systems 33:43 Dynamo: Distributed Inference for Language Models 39:37 Introducing the Modal Platform 43:10 How to Connect and Get Started with Moda
37 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de The Manny Bernabe Show!