Outbound Kitchen - B2B Sales Podcast
Want the prompt I used for this test? And my AI Prompt Library with 30+ outbound prompts? Upgrade now in my newsletter here. [https://newsletter.outbound.kitchen/p/i-tested-gpt-5-and-opus-41-for-account] - I tested seven AI models on the same account research prompt, 12 specific instructions, one target company (Replit), one buyer lens (TrackRec). This is my March 2026 benchmark. The models: Perplexity Sonar, GPT 5.2 Thinking, Grok 4.2 Beta, Grok 4, Claude Opus 4.6, Claygent (Argon), and Gemini 3 Pro. I scored every model on six weighted criteria, tracked which instructions each model actually completed, classified why they missed what they missed, and manually verified every disputed claim. Agenda: - Why I expanded from 3 scoring criteria to 6 — and how adding Business Relevance changed the rankings - What instruction completion reveals that scores alone don't (Perplexity: 10/12, Gemini: 1/12) - The difference between hallucinations and false claims — and why it matters for automation at scale - Why four models found September funding and stopped looking (the persistence failure pattern) - The $400M funding round that may or may not be real — REPORTED vs VERIFIED as a new verification category - Which model to use for high-value accounts vs volume enrichment in Clay - Web app vs API vs Clay: why your results will be different and what I'm testing in the next benchmark Referenced: - TrackRec: https://www.trackrec.co - Replit: https://replit.com - Perplexity: https://www.perplexity.ai - Clay: https://www.clay.com - RepVue: https://www.repvue.com - The account research prompt: Available for Outbound Kitchen paid members - Who I am? Elric Legloire, founder of Outbound Kitchen. When you're ready 👨🍳 Want to work with me? Send me a DM [https://www.linkedin.com/in/elriclegloire/] --- Connect with me 📌 Connect on LinkedIn [https://www.linkedin.com/in/elriclegloire/] 📹 Subscribe on YouTube [https://www.youtube.com/@ElricLegloireOutbound] 🐦 Connect on X [https://x.com/elriclegloire] - Chapters:(0:00) - Why I keep benchmarking AI models (1:45) - The test setup: TrackRec researching Replit (3:00) - What changed from the last test(6 criteria, instruction tracking) (3:30) - The new rankings (4:05) - Perplexity: VP of SDR, podcast, RepVue miss (5:00) - GPT 5.2: zero false claims, Glassdoor depth (5:30) - The $400M funding round — is it real? (7:00) - Grok 4.2: 56 seconds, best RepVue data (8:00) - Bottom four models (quick summary) (8:55) - Verification: hallucinations vs false claims (10:05) - Which models I recommend (10:45) - Web app vs Clay availability (11:30) - What's next
118 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y forma parte de la comunidad de Outbound Kitchen - B2B Sales Podcast!