Steven AI Talk
Stanford University CS336 Lecture 11 Application of Scaling Laws in Large Language Models and Maximal Update Parameterization This lecture explores how modern large language model builders use scaling laws as part of their model design process, and details case studies from relevant papers alongside the mathematical specifics of maximal update parameterization. Following the release of the Chinchilla model, due to intensified industry competition, many frontier labs stopped publicly sharing specific details regarding data and model scaling. However, some highly capable research teams have still openly shared their rigorous studies on scaling laws when executing large-scale model training. Key Takeaways: * In the case of scaling strategies, the Cerebras GPT series applied the Chinchilla recipe across para... All my links: https://linktr.ee/learnbydoingwithsteven [https://linktr.ee/learnbydoingwithsteven] #learnbydoingwithsteven #AI #DeepLearning #Research #TechSummary #MachineLearning #LLM #ScalingLaws #NeuralNetworks #Innovation
689 episodes
Comments
0Be the first to comment
Sign up now and become a member of the Steven AI Talk community!