AI Bites: The Academic Series

EP 51 | CS224N: AI Reasoning (Part 2)

31 min · Ayer
Portada del episodio EP 51 | CS224N: AI Reasoning (Part 2)

Descripción

Today, we are pushing the absolute limits of how Language Models generate text. We move beyond basic architecture to explore how engineers are making AI insanely fast, teaching models to recover from their own mistakes, expanding context windows so AI can read entire books, and proving that a small model can beat an industry giant just by "thinking" longer. Key Topics: * Speculative Decoding: How pairing a massive "Senior Architect" model with a tiny "Junior Coder" draft model speeds up AI text generation by up to 3x with zero loss in quality. * On-Policy Distillation: Why teaching a student model to generate text and recover from its own mistakes (Reverse KL Divergence) is superior to blindly copying a teacher model. * Extending the Context Window: The brilliant math behind Rotary Position Embedding (RoPE) and how manipulating the rotation of word vectors allows models to extrapolate and process massive documents. * Inference-Time Scaling: The paradigm shift of test-time compute. We explain why letting models self-correct via Process-supervised Reward Models (PRMs) is the new frontier of AI efficiency. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de AI Bites: The Academic Series!

Prueba gratis

Empieza 7 días de prueba

$99 / mes después de la prueba. · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros al mes
  • Podcast gratuitos

Todos los episodios

55 episodios

episode Video Short: Tokenization & Multilinguality artwork

Video Short: Tokenization & Multilinguality

A bite-sized, visual breakdown of CS224N Lecture 14! In this new NotebookLM Video Short, we pull back the curtain on the invisible preprocessing layer of modern AI: Tokenization. Key Topics: * The "Strawberry" Problem: A visual look at why ChatGPT can't count letters or spell backwards due to opaque token chunks. * The Multilingual Tax: A direct comparison showing how English-biased tokenizers shatter non-English prompts (like Thai or Somali) into dozens of inefficient fragments, forcing global users to pay more money for worse AI performance. * The Return to Bytes: A quick look at next-generation architectures (like Google's CANINE and MrT5) that dynamically drop bytes to fix this massive inequality. Note: This is an AI-generated visual discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

Ayer1 min
episode EP 52 | CS224N: Tokenization & Multilinguality artwork

EP 52 | CS224N: Tokenization & Multilinguality

Language models do not actually read text—they read tokens. In this episode, we explore the invisible preprocessing layer that Andrej Karpathy says is "at the heart of much weirdness of LLMs." We demystify the Tokenization problem, explain why your AI can't count letters, and discuss the massive socio-economic inequalities baked into modern AI pricing. Key Topics: * The BPE Algorithm: How Byte Pair Encoding finds the "Goldilocks" zone between infinite character sequences and rigid word vocabularies by merging frequent bytes. * Strawberries & Glitch Tokens: Why ChatGPT confidently fails to spell the word "strawberry," and what the "SolidGoldMagikarp" glitch token reveals about adversarial vulnerabilities. * Cross-Lingual Transfer & The Capacity Curse: How an AI trained on English sentiment can zero-shot evaluate French, but degrades in overall performance when forced to learn too many languages at once. * The Tokenization Tax: The stark reality of Subword Fertility. We explain how English-biased tokenizers unfairly overcharge non-English speakers, slowing down processing speeds and degrading output quality for the global majority. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

Ayer51 min
episode EP 51 | CS224N: AI Reasoning (Part 2) artwork

EP 51 | CS224N: AI Reasoning (Part 2)

Today, we are pushing the absolute limits of how Language Models generate text. We move beyond basic architecture to explore how engineers are making AI insanely fast, teaching models to recover from their own mistakes, expanding context windows so AI can read entire books, and proving that a small model can beat an industry giant just by "thinking" longer. Key Topics: * Speculative Decoding: How pairing a massive "Senior Architect" model with a tiny "Junior Coder" draft model speeds up AI text generation by up to 3x with zero loss in quality. * On-Policy Distillation: Why teaching a student model to generate text and recover from its own mistakes (Reverse KL Divergence) is superior to blindly copying a teacher model. * Extending the Context Window: The brilliant math behind Rotary Position Embedding (RoPE) and how manipulating the rotation of word vectors allows models to extrapolate and process massive documents. * Inference-Time Scaling: The paradigm shift of test-time compute. We explain why letting models self-correct via Process-supervised Reward Models (PRMs) is the new frontier of AI efficiency. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

Ayer31 min
episode EP 50 | CS224N: Reasoning Part 1 artwork

EP 50 | CS224N: Reasoning Part 1

How does a language model actually "think"? In this episode, we dive into the fascinating mechanics of AI reasoning. We move past basic text prediction to explore how modern models generate complex, multi-step logic, self-correct their own mistakes, and fundamentally change how we scale compute. Key Topics: * Decoding the Text: Why generation isn't magic, it's an algorithm. We contrast deterministic strategies like Greedy Decoding and Beam Search with open-ended sampling techniques. * The DeepSeek R1 Breakthrough: How the industry proved that state-of-the-art reasoning can be achieved by open-weight models, and how logic is successfully distilled into much smaller architectures. * GRPO & Emergent Reasoning: Unpacking Group Relative Policy Optimization, and taking a look at a model's messy, self-correcting "inner monologue." * Test-Time Compute: The biggest paradigm shift of the year. We explain how models are moving beyond massive training runs to simply "thinking longer" during inference to solve incredibly complex problems. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

25 de jun de 202651 min
episode EP 49 | CS224N: Benchmarking and Evaluation artwork

EP 49 | CS224N: Benchmarking and Evaluation

We spend so much time building massive AI models, but how do we actually know if they are any good? In this episode, we tackle the multi-billion-dollar scientific bottleneck: evaluation. We explore why the science of measuring models is lagging far behind the engineering of building them, and why hitting 100% on a test doesn't mean what you think it means. Key Topics: * The Benchmark SAGA: How the industry moved from basic language understanding (GLUE) to insanely difficult graduate-level tests (GPQA) as models consistently shattered human ceilings. * How Models Cheat: A look at "spurious biases" and annotation artifacts. We explain how lazy human data labeling taught models to cheat on reading comprehension tests using lexical overlap and negation bias. * The Metrics Spectrum: Why classical, exact-match metrics (like BLEU) are totally blind to semantics, and why modern neural metrics (like BERTScore) are dangerously blind to factual hallucinations. * The Algorithmic Courtroom: The rise of LLMs acting as judges for other LLMs. We break down their native biases—like nepotism and verbosity preference—and why multi-model juries are the new gold standard. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

25 de jun de 202616 min