EP 50 | CS224N: Reasoning Part 1

51 min · 25. juni 2026

Beskrivelse

How does a language model actually "think"? In this episode, we dive into the fascinating mechanics of AI reasoning. We move past basic text prediction to explore how modern models generate complex, multi-step logic, self-correct their own mistakes, and fundamentally change how we scale compute. Key Topics: * Decoding the Text: Why generation isn't magic, it's an algorithm. We contrast deterministic strategies like Greedy Decoding and Beam Search with open-ended sampling techniques. * The DeepSeek R1 Breakthrough: How the industry proved that state-of-the-art reasoning can be achieved by open-weight models, and how logic is successfully distilled into much smaller architectures. * GRPO & Emergent Reasoning: Unpacking Group Relative Policy Optimization, and taking a look at a model's messy, self-correcting "inner monologue." * Test-Time Compute: The biggest paradigm shift of the year. We explain how models are moving beyond massive training runs to simply "thinking longer" during inference to solve incredibly complex problems. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

Kommentarer

Vær den første til å kommentere

Registrer deg nå og bli medlem av AI Bites: The Academic Series sitt community!

Prøv gratis

Alle episoder

52 Episoder

EP 50 | CS224N: Reasoning Part 1

25. juni 202651 min

EP 49 | CS224N: Benchmarking and Evaluation

We spend so much time building massive AI models, but how do we actually know if they are any good? In this episode, we tackle the multi-billion-dollar scientific bottleneck: evaluation. We explore why the science of measuring models is lagging far behind the engineering of building them, and why hitting 100% on a test doesn't mean what you think it means. Key Topics: * The Benchmark SAGA: How the industry moved from basic language understanding (GLUE) to insanely difficult graduate-level tests (GPQA) as models consistently shattered human ceilings. * How Models Cheat: A look at "spurious biases" and annotation artifacts. We explain how lazy human data labeling taught models to cheat on reading comprehension tests using lexical overlap and negation bias. * The Metrics Spectrum: Why classical, exact-match metrics (like BLEU) are totally blind to semantics, and why modern neural metrics (like BERTScore) are dangerously blind to factual hallucinations. * The Algorithmic Courtroom: The rise of LLMs acting as judges for other LLMs. We break down their native biases—like nepotism and verbosity preference—and why multi-model juries are the new gold standard. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

25. juni 202616 min

EP 48 | CS224N: RAG and Language Agents

Up until now, we’ve looked at Language Models as isolated brains trapped in a box. In this episode, we cross the threshold into the absolute bleeding edge of AI: giving models a search engine to browse the web, memory to remember past conversations, and tools to execute code. We break down the inner workings of Retrieval-Augmented Generation (RAG) and the anatomy of truly autonomous Language Agents. Key Topics: * The Knowledge Problem & RAG: Why forcing LLMs to memorize everything leads to hallucinations, how the Retriever-Reader framework (DPR vs. BM25) fixes it, and why stuffing too many documents into a model triggers the "Lost in the Middle" problem. * The Anatomy of an Agent: How we transform a standard text-predictor into an active agent using a core LLM surrounded by an external environment, reasoning protocols, memory structures, and tools. * Reasoning & Planning (ReAct vs. Reflexion): Unpacking the massive breakthrough of the ReAct (Reason + Act) framework, and how self-correction loops and multi-agent debates drastically reduce AI hallucinations. * The Cognitive Architecture (Memory & Tool Use): Distinguishing between Episodic, Semantic, and Procedural memory (including how MemGPT acts like an Operating System). Plus, how models like Toolformer teach themselves to use external APIs. * The Python "While True" Loop: Demystifying the engineering behind agents by looking at the simple code loops that power them, and the massive challenges the industry faces in trying to evaluate open-ended AI behavior. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

19. juni 202622 min

EP 47 | CS224N: Efficient Adaptation

We know how to build and align massive foundational models, but what if you don't have a $100 million supercomputer? In this episode, we tackle the practical wall of modern AI: compute costs. We explore how researchers are circumventing astronomical expenses to adapt massive models efficiently, pushing the boundaries of what you can train on a single consumer GPU while making AI an environmental imperative. Key Topics: * Fixing RLHF with DPO: Why the industry is abandoning complex reinforcement learning for Direct Preference Optimization, and the ethical reality of the "digital sweatshops" providing our preference data. * The Power and Limits of Prompting: Unlocking Zero-Shot capabilities and Chain-of-Thought reasoning, while acknowledging the fragile, compute-heavy "dark art" of prompt engineering. * The PEFT Revolution & LoRA: The brilliant math behind Low-Rank Adaptation that reduces trainable parameters by 99.9% with zero added inference latency. * Adapters & Soft Prompts: How inserting tiny bottleneck networks enables modular, plug-and-play skills—like swapping between different language dialects on the fly without altering the base model. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

19. juni 202620 min

EP 46 | CS224N: Post-training

How do we turn a raw, chaotic text-predictor into a helpful, conversational AI assistant? In this episode, we dive into the massive pipeline of Post-training. We explore the transition from Instruction Fine-Tuning to complex Reinforcement Learning, and why teaching an AI to be "helpful" sometimes inadvertently teaches it to lie. Key Topics: * The Alignment Problem: Why a raw foundational model is just a "document completer" and how Instruction Fine-Tuning (IFT) begins the process of teaching it to follow user commands. * RLHF & Reward Models: How we use pairwise human comparisons to train a Reward Model, and how PPO is used to optimize the AI's behavior without breaking its grammar. * Reward Hacking & Hallucinations: The dark side of RLHF. We explore why heavily incentivizing models to sound authoritative leads to massive real-world failures, like Bing's sports hallucinations and Google Bard's $100 Billion stock drop. * The DPO Breakthrough: How researchers removed the unstable reinforcement learning step entirely with Direct Preference Optimization, creating the new open-source standard. * Ethical Realities: A candid look at the human cost of AI alignment, from low-wage "digital sweatshops" to the severe annotator biases that bleed directly into modern models. Note: This is an AI-generated discussion created using Google's NotebookLM, based on publicly available Stanford University course material (specifically CS224N) and personal study notes from my learning journey.

11. juni 202622 min

EP 50 | CS224N: Reasoning Part 1

Beskrivelse

Kommentarer

Prøv gratis i 14 dager

Alle episoder