The AI Concepts Podcast

Module 6: RAG | Long Context vs RAG - Do You Still Need Retrieval at All

8 min · 12. Juni 2026
Episode Module 6: RAG | Long Context vs RAG - Do You Still Need Retrieval at All Cover

Beschreibung

This episode closes out Module 6 by tackling the question that has been getting louder since large context windows arrived. If a model can hold hundreds of thousands or even millions of tokens at once, do we still need all the architecture we just spent this module building? We explore why RAG was never just about fitting text into a small prompt, what retrieval is actually doing that a large context window cannot, and how the shift from compression to curation changes what good RAG looks like today. We cover when long context is genuinely the better tool, when retrieval still matters deeply, and why in most real enterprise systems the best answer is both working together. The episode closes with the argument that RAG is not disappearing. It is maturing. And everything we built in this module is part of that stronger foundation. By the end you will have a clear and honest picture of where these two approaches fit, and why understanding both puts you well ahead of most people working in this space.

Kommentare

0

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der The AI Concepts Podcast-Community!

Loslegen

2 Monate für 1 €

Dann 4,99 € / Monat · Jederzeit kündbar.

  • Podcasts nur bei Podimo
  • 20 Stunden Hörbücher / Monat
  • Alle kostenlosen Podcasts

Alle Folgen

74 Folgen

Episode Module 6: RAG | Long Context vs RAG - Do You Still Need Retrieval at All Cover

Module 6: RAG | Long Context vs RAG - Do You Still Need Retrieval at All

This episode closes out Module 6 by tackling the question that has been getting louder since large context windows arrived. If a model can hold hundreds of thousands or even millions of tokens at once, do we still need all the architecture we just spent this module building? We explore why RAG was never just about fitting text into a small prompt, what retrieval is actually doing that a large context window cannot, and how the shift from compression to curation changes what good RAG looks like today. We cover when long context is genuinely the better tool, when retrieval still matters deeply, and why in most real enterprise systems the best answer is both working together. The episode closes with the argument that RAG is not disappearing. It is maturing. And everything we built in this module is part of that stronger foundation. By the end you will have a clear and honest picture of where these two approaches fit, and why understanding both puts you well ahead of most people working in this space.

12. Juni 20268 min
Episode Module 6: RAG | GraphRAG - When Relationships Matter More Than Text Cover

Module 6: RAG | GraphRAG - When Relationships Matter More Than Text

This episode addresses the category of questions that vector search fundamentally cannot answer, questions about relationships between things. We explore what a knowledge graph is and why traversing connections between entities requires a completely different data structure than semantic similarity search. We break down Microsoft's GraphRAG approach, how it extracts entities and relationships from documents during indexing, uses community detection to identify clusters of related knowledge, and generates summaries that enable global queries across an entire corpus rather than just local document retrieval. We cover the cost improvements brought by LazyGraphRAG, the hybrid vector-plus-graph pattern most production teams are moving toward, Neo4j as the go-to graph database, and a lighter-weight entity extraction approach for teams not ready for a full knowledge graph. By the end you will understand when relationships matter more than text and how to build systems that can answer both kinds of questions.

10. Juni 20268 min
Episode Module 6: RAG | Query Transformation - When the Question Is the Bottleneck Cover

Module 6: RAG | Query Transformation - When the Question Is the Bottleneck

This episode addresses a retrieval failure that has nothing to do with your index and everything to do with the query itself. We explore the vocabulary gap between how people ask questions and how documents are written, and why even strong embedding models cannot always bridge it. We break down three techniques that fix the query before the search runs: query rewriting to reformulate casual language into formal search terms, HyDE which generates a hypothetical answer and uses that as the search query instead of the question, and multi-query expansion which generates multiple phrasings to cast a wider retrieval net. We also cover step-back prompting for queries that need broader conceptual grounding before searching. By the end you will understand why the question itself is often the highest-leverage thing to improve in a retrieval pipeline.

10. Juni 20267 min
Episode Module 6: RAG | Parent-Child Indexing - Search Small, Retrieve Big Cover

Module 6: RAG | Parent-Child Indexing - Search Small, Retrieve Big

This episode addresses the fundamental tension between retrieval precision and generation context. We explore why small chunks produce tight embeddings that retrieve well but leave the model without enough surrounding information, and why large chunks give the model context but dilute the embedding and hurt search quality. We break down parent-child indexing as the solution that decouples these two problems entirely, how child chunks handle the search and parent chunks handle the generation, and how to structure the hierarchy for documents of different complexity. We cover practical implementations in LlamaIndex and LangChain and close with guidance on when this pattern earns its place in a pipeline. By the end you will understand how to stop choosing between finding the right thing and giving the model enough to work with.

10. Juni 20267 min
Episode Module 6: RAG | Reranking - The Second Stage That Gets Retrieval Right Cover

Module 6: RAG | Reranking - The Second Stage That Gets Retrieval Right

This episode addresses the gap between finding candidate chunks and finding the right ones. We explore the bi-encoder bottleneck, why compressing text into a single vector for comparison loses critical nuance, and how cross-encoders fix this by reading the query and document together in a single forward pass. We introduce ColBERT as a powerful middle ground between speed and accuracy through token-level late interaction, walk through the production tooling landscape including Cohere Rerank, BGE models, and RAGatouille, and close by stitching hybrid search and reranking into a complete three-stage retrieval funnel. By the end you will understand why two-stage retrieval is now the standard architecture for any serious RAG pipeline.

10. Juni 20269 min