Inside Open Networking by STORDIS – the podcast where tech meets real life
I’ll rewrite your session description to match the same structure and tone as the example: short intro, “Learn how” value line, punchy bullet takeaways, and a timestamp-style outline, ending with the same contact/social footer. Recorded live at the OCP Regional Summit Dublin 2025, this episode features Nanda Ravindran (VP of Technical Sales, Edgecore Networks) sharing hands-on, real-world insights into tuning AI-scale network fabrics with SONiC. Learn how Edgecore benchmarks and optimizes 800G AI switches in SONiC — and why consistent, repeatable tuning (plus validation under realistic load) is critical for stable AI network performance. * AI workload characteristics and the fabric performance challenges they introduce * Step-by-step SONiC tuning: PFC, ECN, and DLB configuration fundamentals * Using Spirent test equipment to generate realistic AI traffic profiles and stress conditions * What changes performance: topology choices, link failures, VXLAN overlays, and traffic patterns * Flowlet mode vs. hash mode — which delivers better outcomes for AI use cases * Why automation, repeatable test methods, and community best practices matter at AI scale * Edgecore’s open networking approach: collaborating with Broadcom on Enterprise SONiC for next-gen AI deployments Session outline: 00:00 Intro — Nanda Ravindran & session overview 01:00 Why AI fabric tuning matters — 800G benchmarking + recurring performance gaps 02:00 AI workload traits — elephant flows, low entropy, load-balancing pressure; goal: lossless + low latency 03:00 SONiC tuning focus — RoCEv2 mapping + PFC, ECN, DLB 04:00 Testbed overview — 6× Edgecore 800G (TH5), SONiC 202311-based, non-blocking fabric 05:00 Spirent methodology — AI workload emulation, collectives, measurements 06:00 PFC configuration — QoS profiles (DSCP→TC→Queue/PG), bindings, enablement 08:00 ECN configuration — WRED profile, thresholds, drop probability sweeps 09:00 DLB explained — hash vs flowlet; why flowlet tuning matters 10:00 Key findings — PFC-only best in lab; PFC+ECN required for deployments 12:00 ECN result highlight — example best setting (1% drop, 2MB/10MB thresholds) 13:00 800G vs 400G/breakout — native 800G performs better for AI workloads 14:00 Failure + VXLAN tests — link failures hurt; VXLAN shows minimal impact 15:00 Collectives + PXN — PXN best; flowlet recovers faster than hash 16:00 Call to action — automation + repeatable community best practices 18:00 Q&A — question on newer enhanced DLB/ECMP; plan to test on newer SONiC 📬 Questions or support: support@stordis.com | 🌐 www.stordis.com [http://www.stordis.com] Let’s get social 💻 Blog: https://stordis.com/blog/ [https://stordis.com/blog/] 📘 Facebook: https://www.facebook.com/people/STORDIS-GmbH/100057058555819/ [https://www.facebook.com/people/STORDIS-GmbH/100057058555819/] 📸 Instagram: https://www.instagram.com/stordis_open_networking/ [https://www.instagram.com/stordis_open_networking/] 👥 LinkedIn: https://www.linkedin.com/company/stordis/ [https://www.linkedin.com/company/stordis/] 🐦 X: https://twitter.com/STORDIS_GmbH/ [https://twitter.com/STORDIS_GmbH/] #SONiC #AIFabricTuning #Edgecore #800GSwitches #OCPDublin2025 #ECN #PFC #DLB #AIWorkloads #SONiCOptimization #OpenNetworking #EnterpriseSONiC #Broadcom #FlowletMode #NetworkAutomation #AIInfrastructure
13 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de Inside Open Networking by STORDIS – the podcast where tech meets real life!