Module 6: RAG | Chunking - Where You Cut Decides What Gets Found

10 min · 29. huhti 2026

Kuvaus

This episode is about chunking, the quiet step in a RAG pipeline that decides whether your system retrieves the right answer or a confidently wrong one. It covers why the chunk is the real unit of retrieval, the tradeoff between context and precision, the main strategies teams use to split documents, and why testing your chunks against real questions matters more than picking the perfect size.

Kommentit

Ole ensimmäinen kommentoija

Rekisteröidy nyt ja liity The AI Concepts Podcast-yhteisöön!

Aloita maksutta

Kaikki jaksot

73 jaksot

Module 6: RAG | GraphRAG - When Relationships Matter More Than Text

This episode addresses the category of questions that vector search fundamentally cannot answer, questions about relationships between things. We explore what a knowledge graph is and why traversing connections between entities requires a completely different data structure than semantic similarity search. We break down Microsoft's GraphRAG approach, how it extracts entities and relationships from documents during indexing, uses community detection to identify clusters of related knowledge, and generates summaries that enable global queries across an entire corpus rather than just local document retrieval. We cover the cost improvements brought by LazyGraphRAG, the hybrid vector-plus-graph pattern most production teams are moving toward, Neo4j as the go-to graph database, and a lighter-weight entity extraction approach for teams not ready for a full knowledge graph. By the end you will understand when relationships matter more than text and how to build systems that can answer both kinds of questions.

Eilen8 min

Module 6: RAG | Query Transformation - When the Question Is the Bottleneck

This episode addresses a retrieval failure that has nothing to do with your index and everything to do with the query itself. We explore the vocabulary gap between how people ask questions and how documents are written, and why even strong embedding models cannot always bridge it. We break down three techniques that fix the query before the search runs: query rewriting to reformulate casual language into formal search terms, HyDE which generates a hypothetical answer and uses that as the search query instead of the question, and multi-query expansion which generates multiple phrasings to cast a wider retrieval net. We also cover step-back prompting for queries that need broader conceptual grounding before searching. By the end you will understand why the question itself is often the highest-leverage thing to improve in a retrieval pipeline.

Eilen7 min

Module 6: RAG | Parent-Child Indexing - Search Small, Retrieve Big

This episode addresses the fundamental tension between retrieval precision and generation context. We explore why small chunks produce tight embeddings that retrieve well but leave the model without enough surrounding information, and why large chunks give the model context but dilute the embedding and hurt search quality. We break down parent-child indexing as the solution that decouples these two problems entirely, how child chunks handle the search and parent chunks handle the generation, and how to structure the hierarchy for documents of different complexity. We cover practical implementations in LlamaIndex and LangChain and close with guidance on when this pattern earns its place in a pipeline. By the end you will understand how to stop choosing between finding the right thing and giving the model enough to work with.

Eilen7 min

Module 6: RAG | Reranking - The Second Stage That Gets Retrieval Right

This episode addresses the gap between finding candidate chunks and finding the right ones. We explore the bi-encoder bottleneck, why compressing text into a single vector for comparison loses critical nuance, and how cross-encoders fix this by reading the query and document together in a single forward pass. We introduce ColBERT as a powerful middle ground between speed and accuracy through token-level late interaction, walk through the production tooling landscape including Cohere Rerank, BGE models, and RAGatouille, and close by stitching hybrid search and reranking into a complete three-stage retrieval funnel. By the end you will understand why two-stage retrieval is now the standard architecture for any serious RAG pipeline.

Eilen9 min

Module 6: RAG | Dense and Sparse Search - Why Vector Search Alone Is Not Enough

This episode addresses one of the most common gaps in RAG pipelines, relying solely on semantic search. We explore how dense retrieval works and where it excels, then introduce sparse retrieval with BM25 and why it catches what vector search misses entirely, particularly exact identifiers like part numbers, codes, and proper nouns. We break down how hybrid search combines both approaches using Reciprocal Rank Fusion, why it consistently outperforms either method alone, and how modern vector databases like Weaviate, Pinecone, and Qdrant support this natively. By the end you will understand why the best retrieval systems are not choosing between semantic and keyword search but running both.

Eilen11 min

Module 6: RAG | Chunking - Where You Cut Decides What Gets Found

Kuvaus

Kommentit

14 vrk ilmainen kokeilu

Kaikki jaksot