The AI Concepts Podcast

Module 6: RAG | Reranking - The Second Stage That Gets Retrieval Right

9 min · 10 de jun de 2026
Portada del episodio Module 6: RAG | Reranking - The Second Stage That Gets Retrieval Right

Descripción

This episode addresses the gap between finding candidate chunks and finding the right ones. We explore the bi-encoder bottleneck, why compressing text into a single vector for comparison loses critical nuance, and how cross-encoders fix this by reading the query and document together in a single forward pass. We introduce ColBERT as a powerful middle ground between speed and accuracy through token-level late interaction, walk through the production tooling landscape including Cohere Rerank, BGE models, and RAGatouille, and close by stitching hybrid search and reranking into a complete three-stage retrieval funnel. By the end you will understand why two-stage retrieval is now the standard architecture for any serious RAG pipeline.

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de The AI Concepts Podcast!

Prueba gratis

Empieza 7 días de prueba

$99 / mes después de la prueba. · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros al mes
  • Podcast gratuitos

Todos los episodios

74 episodios

episode Module 6: RAG | Long Context vs RAG - Do You Still Need Retrieval at All artwork

Module 6: RAG | Long Context vs RAG - Do You Still Need Retrieval at All

This episode closes out Module 6 by tackling the question that has been getting louder since large context windows arrived. If a model can hold hundreds of thousands or even millions of tokens at once, do we still need all the architecture we just spent this module building? We explore why RAG was never just about fitting text into a small prompt, what retrieval is actually doing that a large context window cannot, and how the shift from compression to curation changes what good RAG looks like today. We cover when long context is genuinely the better tool, when retrieval still matters deeply, and why in most real enterprise systems the best answer is both working together. The episode closes with the argument that RAG is not disappearing. It is maturing. And everything we built in this module is part of that stronger foundation. By the end you will have a clear and honest picture of where these two approaches fit, and why understanding both puts you well ahead of most people working in this space.

12 de jun de 20268 min
episode Module 6: RAG | GraphRAG - When Relationships Matter More Than Text artwork

Module 6: RAG | GraphRAG - When Relationships Matter More Than Text

This episode addresses the category of questions that vector search fundamentally cannot answer, questions about relationships between things. We explore what a knowledge graph is and why traversing connections between entities requires a completely different data structure than semantic similarity search. We break down Microsoft's GraphRAG approach, how it extracts entities and relationships from documents during indexing, uses community detection to identify clusters of related knowledge, and generates summaries that enable global queries across an entire corpus rather than just local document retrieval. We cover the cost improvements brought by LazyGraphRAG, the hybrid vector-plus-graph pattern most production teams are moving toward, Neo4j as the go-to graph database, and a lighter-weight entity extraction approach for teams not ready for a full knowledge graph. By the end you will understand when relationships matter more than text and how to build systems that can answer both kinds of questions.

10 de jun de 20268 min
episode Module 6: RAG | Query Transformation - When the Question Is the Bottleneck artwork

Module 6: RAG | Query Transformation - When the Question Is the Bottleneck

This episode addresses a retrieval failure that has nothing to do with your index and everything to do with the query itself. We explore the vocabulary gap between how people ask questions and how documents are written, and why even strong embedding models cannot always bridge it. We break down three techniques that fix the query before the search runs: query rewriting to reformulate casual language into formal search terms, HyDE which generates a hypothetical answer and uses that as the search query instead of the question, and multi-query expansion which generates multiple phrasings to cast a wider retrieval net. We also cover step-back prompting for queries that need broader conceptual grounding before searching. By the end you will understand why the question itself is often the highest-leverage thing to improve in a retrieval pipeline.

10 de jun de 20267 min
episode Module 6: RAG | Parent-Child Indexing - Search Small, Retrieve Big artwork

Module 6: RAG | Parent-Child Indexing - Search Small, Retrieve Big

This episode addresses the fundamental tension between retrieval precision and generation context. We explore why small chunks produce tight embeddings that retrieve well but leave the model without enough surrounding information, and why large chunks give the model context but dilute the embedding and hurt search quality. We break down parent-child indexing as the solution that decouples these two problems entirely, how child chunks handle the search and parent chunks handle the generation, and how to structure the hierarchy for documents of different complexity. We cover practical implementations in LlamaIndex and LangChain and close with guidance on when this pattern earns its place in a pipeline. By the end you will understand how to stop choosing between finding the right thing and giving the model enough to work with.

10 de jun de 20267 min
episode Module 6: RAG | Reranking - The Second Stage That Gets Retrieval Right artwork

Module 6: RAG | Reranking - The Second Stage That Gets Retrieval Right

This episode addresses the gap between finding candidate chunks and finding the right ones. We explore the bi-encoder bottleneck, why compressing text into a single vector for comparison loses critical nuance, and how cross-encoders fix this by reading the query and document together in a single forward pass. We introduce ColBERT as a powerful middle ground between speed and accuracy through token-level late interaction, walk through the production tooling landscape including Cohere Rerank, BGE models, and RAGatouille, and close by stitching hybrid search and reranking into a complete three-stage retrieval funnel. By the end you will understand why two-stage retrieval is now the standard architecture for any serious RAG pipeline.

10 de jun de 20269 min