EP8: Training Models at Scale | AWS for AI Podcast

Descripción

Join us for an enlightening conversation with Anton Alexander, AWS's Senior Specialist for Worldwide Foundation Models, as we delve into the complexities of training and scaling large foundation models. Anton brings his unique expertise from working with the world's top model builders, along with his fascinating journey from Trinidad and Tobago to becoming a leading AI infrastructure expert. Discover practical insights on managing massive GPU clusters, optimizing distributed training, and handling the critical challenges of model development at scale. Learn about cutting-edge solutions in GPU failure detection, checkpointing strategies, and the evolution of inference workloads. Get an insider's perspective on emerging trends like GRPO, visual LLMs, and the future of AI model development. Don't miss this technical deep dive where we explore real-world solutions for building and deploying foundational AI models, featuring discussions on everything from low-level infrastructure optimization to high-level AI development strategies. Learn more: http://go.aws/47yubYq [http://go.aws/47yubYq] Amazon SageMaker HyperPod : https://aws.amazon.com/fr/sagemaker/ai/hyperpod/ [https://aws.amazon.com/fr/sagemaker/ai/hyperpod/] The Llama 3 Herd of Models paper : https://arxiv.org/abs/2407.21783 [https://arxiv.org/abs/2407.21783] Chapters: 00:00:00 : Introduction and Guest Background 00:01:18 : Anton Journey from Caribbean to AI 00:05:52 : Mathematics in AI 00:07:20 : Large Model Training Challenges 00:09:54 : GPU failures : Lama Herd of models 00:13:40 : Grey failures 00:15:05 : Model training trends 00:17:40 : Managing Mixture of Experts Models 00:21:50 : Estimate how many GPUs you need. 00:25:12 : Monitoring loss function 00:27:08 : Crashing trainings 00:28:10 : SageMaker Hyperpod story 00:32:15 : How we automate managing grey failures 00:37:28 : which metrics to optimize for 00:40:23 : Checkpointing Strategies 00:44:48 : USE Utilization, Saturation, Errors 00:50:11 : SageMaker Hyperpod for Inferencing 00:54:58 : Resiliency in Training vs Inferencing workloads 00:56:44 : NVIDIA NeMo Ecosystem and Agents 00:59:49 : Future Trends in AI 01:03:17 : Closing Thoughts

EP9: Lucidya : AI For Enhanced Customer Experience and Social Listening | AWS for AI Podcast

Join us for an in-depth conversation with Dr. Zuhair Khayat, CTO and co-founder of Lucidya, a groundbreaking AI company revolutionizing customer experience in the Middle East. From his journey through academia at KAUST to pioneering Arabic language AI solutions, Dr. Khayat shares invaluable insights on building technology for Arabic markets. Discover how Lucidya is tackling unique challenges in Arabic language processing, managing customer experience across dialects, and leveraging AI for brand management in the MENA region. Learn how Lucidya is transforming social listening and customer intelligence through advanced AI. From sophisticated sentiment analysis to real-time brand monitoring across multiple dialects, discover the future of customer experience management. Learn more: http://go.aws/47yubYq [https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbFNCalRoeVRTdWROLW80LVA4MzMyYktfbjJPd3xBQ3Jtc0tsLXg3cW9oQ2JES1ltV0JScThlRDlLVGVIel9GSWJkRXpqM1NVaHRmbUl0TVFhU0ZKcWRQQTFXeFkyU1I1SHNaMEdPbmhxbzlFYWpDMTdDcUh2WGEwMENYcUpES2JQUGhYUnRWT0dodDNRQW54T295Zw&q=http%3A%2F%2Fgo.aws%2F47yubYq&v=-f6ylJ9R7sc] Lucidya Website: http://go.aws/4846zeH [https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbWJmWDcxY1JWampOd2NyaHdHOVpQSGZIUS12d3xBQ3Jtc0tuTXJWQ0QyOWJ0WUZybmIxUzJUeXNNazFseU11NzZya2R6dkFCcEdyNjNaWGcxU0h4X2VpQURsUnRJdUxoYTVlWHJFMnUwQzRPREo1NmF3NXpwVUlmbjhQdkxmVUhDaWcxNnZ5S01qVU1RWDAwd2RUdw&q=http%3A%2F%2Fgo.aws%2F4846zeH&v=-f6ylJ9R7sc] Dr. Khayat's Research: https://scholar.google.com/citations?... [https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbEpWMWlVMlExM19XSHJQMXpvWmZNaGRvRDNXZ3xBQ3Jtc0tsVHdIRzJIbkR0Q2hiMHFWMklsendVMjNJLWRXYmx5OFBCVXY0bW1IcVEtSU4weGJZSFJ0aDVCYjl2V0xvSm5jdGQ5Ym5welRNT19jWW5rX1daZDFHYU10cVJNZy14S1I4bk9yUHJLd0gwTUs4NUFZdw&q=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fuser%3Dv4RoDV0AAAAJ%26hl%3Den&v=-f6ylJ9R7sc] Chapters: 00:00 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1] - Introduction 02:27 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=147s] - Lucidya's Origin Story 05:20 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=320s] - AI startup vision before GenAI 07:30 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=450s] - From Saudi Arabia to the world. 08:11 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=491s] - Lucidya’s Long Term Mission 08:59 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=539s] - What is Customer Experience ? 10:55 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=655s] - Lucidya’s Value Proposition 12:18 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=738s] - Use Cases : Brand Managment 17:20 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=1040s] - Use Cases : Customer Support 19:14 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=1154s] - Use Cases : Market Research 20:43 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=1243s] - Use Cases : Marketing Optimization 21:52 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=1312s] - Building Products in GenAI Era 25:58 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=1558s] - Keeping up with the AI innovations 26:45 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=1605s] - Product bundles from Research to Support 28:50 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=1730s] - Evolution of Challenges for Building AI Models 30:18 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=1818s] - ASAD - Arabic Sentiment Analysis Data 31:22 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=1882s] - Navigating Ambiguity in Human Communications 33:40 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=2020s] - Managing Biases in Data with agents 38:02 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=2282s] - Arabic Dialects Support 40:41 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=2441s] - Split Learning, Collaborative Training on Private Data on the Cloud. 47:10 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=2830s] - Opportunities for Split Learning in Regulated Industries 49:35 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=2975s] - U shape Learning as a Collaborative Framework for Privacy 50:12 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=3012s] - a CTO’s advise for AI startups 53:20 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=3200s] - Choosing the right architecture on AWS for your team 55:45 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=3345s] - The Future of AI. 58:06 [https://www.youtube.com/watch?v=-f6ylJ9R7sc&list=PLhr1KZpdzukcfLYNSvftpxp9yn_PC5zP1&index=1&t=3486s] - Closing Remarks

2 de oct de 202558 min

EP8: Training Models at Scale | AWS for AI Podcast

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios