AI Bites: The Academic Series
If the Transformer architecture gave us the engine for modern AI, this episode is all about the fuel. We are diving into the single most consequential paradigm shift in modern NLP: Pre-training. We explore how we train these massive models, the distinct architectures we use, and the surprising emergent behaviors that happen when we scale them up. Key Topics: * The Context Problem & Subwords: Why static word embeddings like Word2Vec failed, and how Byte-Pair Encoding (BPE) solved the "Unknown Token" problem by breaking novel words into familiar chunks. * What Pre-training Actually Teaches: How the simple task of reconstructing masked sentences forces models to learn trivia, syntax, and arithmetic—while also absorbing the internet's dangerous biases. * The 3 Core Architectures: A breakdown of Encoders (BERT and the 80/10/10 rule), Decoders (the GPT family's autoregressive generation), and Encoder-Decoders (T5's span corruption). * Scaling Laws & The Chinchilla Revelation: How OpenAI unlocked In-Context Learning with GPT-3, and how DeepMind later proved the math was slightly off—showing that smaller models trained on vastly more data actually yield superior results. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.
48 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de AI Bites: The Academic Series!