NeLI Pod
Guest: Dr. Jeremy Pickens Managing Director of Applied Science, Elevate Episode Overview In this masterclass‑level conversation, Dr. Jeremy Pickens—one of the most respected information retrieval scientists in e‑discovery—joins Daniel and Brandon to explore the intellectual foundations and future trajectory of search, relevance, and AI in legal practice. Jeremy’s work has shaped the evolution from keyword search to TAR 1.0, to continuous active learning (CAL), and now to the GenAI era. If you’ve used active learning in any modern review platform, you’ve likely benefited from his research. The discussion ranges from polyphonic music retrieval to tokenization, from ancient Greek philosophy to the cold‑start problem in TAR, and from contextual diversity to the challenges of evaluating AI systems. Jeremy brings a rare blend of deep technical rigor and practical sensibility, offering a perspective that helps legal professionals understand not just what works, but why it works. Key Takeaways * Patterns matter more than keywords. Jeremy’s early work in polyphonic music retrieval mirrors the complexity of legal documents—both require identifying structural patterns, not just surface‑level signals. (“Finding those connections historically over time… is very similar to the storytelling and pattern finding we want to do in e‑discovery.”) * Feature extraction is as important as the algorithm. Tokenization, stemming, and sub‑word representations can make or break a machine learning model’s ability to recognize meaning across documents. * Outcome‑driven evaluation beats checkbox shopping. Lawyers should focus on how well a system performs on real data—not on whether it claims to use a particular algorithm or technique. Action Items for Legal Teams * Evaluate platforms using simulations, not demos. Ask vendors to run your data through their system to measure recall, precision, and learning speed. * Understand the basics of tokenization. Even a high‑level grasp helps practitioners make better decisions about search and review workflows. * Adopt CAL for early signal exploitation. Even a single coded document provides useful information—there’s no need for massive seed sets. Chapters & Timecodes 00:00 – Introduction Daniel and Brandon introduce Dr. Jeremy Pickens and his impact on the field. 00:03:04 – Jeremy’s Philosophy: Being “Part of the Flow” Why ideas in e‑discovery evolve collectively, not individually. 00:04:55 – From Polyphonic Music to Legal Documents How musical pattern analysis informed Jeremy’s approach to information retrieval. 00:08:34 – Short Messages, Semantic Boundaries, and IR Challenges Why Slack, Teams, and SMS require smarter segmentation techniques. 00:11:03 – Feature Extraction 101 Tokenization, stemming, n‑grams, and why they matter for TAR. 00:14:53 – Sub‑Word Tokenization and OCR How character‑level patterns help overcome noisy text. 00:17:48 – What Practitioners Should Ask Vendors Why checklists fail—and what outcome‑driven evaluation looks like. 00:20:34 – The Importance of Frequent Model Updates How recalculating rankings every two minutes improved precision by up to 20%. 00:22:40 – Why Simulations Are the Missing Piece Jeremy explains why the industry needs better evaluation frameworks. 00:24:40 – Contextual Diversity: Finding What You Don’t Know How algorithms identify unexplored pockets of documents. 00:28:56 – Solving the Cold‑Start Problem Why CAL can begin learning from the very first document. 00:30:01 – Greek Philosophy and TAR Parmenides vs. Heraclitus as a metaphor for TAR 1.0 vs. TAR 2.0. Compelling Quote “You don’t know what you don’t know… and the machine can look globally across the entire collection to find what you’ve never seen before.”
25 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de NeLI Pod!