AI Gives You the Vocabulary. It Doesn’t Give You the Expertise.

19 min · 26. apr. 2026

Beskrivelse

Hiring managers are watching something uncomfortable happen in interview rooms right now. Candidates arrive with the right credentials, the right vocabulary, the right tool stack on their résumés, and then someone asks them to reason through a problem out loud, and the room goes quiet in the wrong way. Not in the thoughtful kind of way but the empty kind that tells you the person across the table has never actually had to think through a hard problem on their own. And research is converging on the same conclusion. Microsoft [https://www.microsoft.com/en-us/research/wp-content/uploads/2025/01/lee_2025_ai_critical_thinking_survey.pdf], the Swiss Business School [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5082524], and TestGorilla [https://www.testgorilla.com/skills-based-hiring/state-of-skills-based-hiring-2025/] have all documented the same pattern independently: heavy AI reliance correlates directly with declining critical thinking, and the effect is strongest in younger, less experienced practitioners. This isn’t a technology story so much as a cognition story, and the SEO industry is living a version of it in slow motion. What none of those studies name is the specific mechanism: the three-layer architecture of expertise where AI commands the retrieval layer completely, and the judgment layers underneath it are more exposed than they've ever been. That architecture is what this piece is about. The debate is framed on the wrong axis Every conversation about AI and critical thinking eventually lands in the same place: humans versus machines, organic thinking versus generated output, authentic expertise versus artificial fluency. It’s a compelling frame and also the wrong one. The real fracture line isn’t human versus AI. It’s retrieval versus judgment, and those are not the same cognitive act, even though AI has made them feel interchangeable in ways that should concern anyone serious about their craft. Retrieval is access. It’s the ability to surface relevant information, synthesize patterns across a body of knowledge, and produce fluent output that maps to the shape of expertise. Large language models are extraordinary at this, genuinely and structurally superior to any individual human at the retrieval layer, and getting better at speed. Fighting that reality is not a strategy. Judgment, however, is different. Judgment is knowing which question is actually the right question given this specific context, the ability to recognize when something that looks correct is wrong for this situation in ways that aren’t in any training data, the accumulated weight of having been wrong in consequential situations, learning why, and recalibrating. You cannot retrieve your way to judgment. You build it through deliberate practice under real conditions, over time, with skin in the game that a model structurally cannot have. The problem isn’t that AI handles retrieval well. The problem is that retrieval output now sounds so much like judgment output that the gap between them has become nearly invisible, especially to people who haven’t yet built enough judgment to know the difference. The Judgment Stack Think about expertise as a stack, not a spectrum. Layer 1 is retrieval - synthesis, pattern vocabulary, volume processing, surface recognition. This is AI territory, and handing work in this area over to an AI is not weakness but correct resource allocation. The practitioner who uses an LLM to compress a competitive analysis that would have taken three hours into forty minutes isn’t cutting corners, they’re buying back time to do the work that actually compounds. Layer 2 is the interface layer - hypothesis formation, question quality, contextual filtering, knowing which output to trust and which to interrogate. This is where the leverage actually lives, and it’s fundamentally human-plus-AI territory. Your prompt quality is a direct proxy for your judgment quality. Two practitioners can feed the same LLM the same general problem and get outputs that are miles apart in usefulness, because one of them knows what a good answer looks like before they ask the question, and that foreknowledge doesn’t come from the model but from Layer 3 working backward. Layer 3 is consequence and context - the ability to recognize when a pattern that has always worked is about to break, to assess novel situations that don’t map cleanly to anything in the training data, to hold strategic framing steady under pressure when the data is ambiguous. This is human territory, not because AI couldn’t theoretically develop something like it, but because it requires something a deployed model structurally cannot have: skin in the game, real consequence, the accumulated scar tissue of being wrong when it mattered and having to carry that forward. The critical thinking crisis everyone is diagnosing right now is not, at its root, an AI problem but a Layer 2 collapse. People skip directly from Layer 1 retrieval to Layer 3 claims, bypassing the judgment infrastructure entirely. Layer 1 output is fluent, confident, and often correct enough to pass casual scrutiny, which keeps the gap invisible right up until someone asks a follow-up the model didn’t anticipate, and the person has no independent footing to stand on. What SEO is actually revealing SEO is a useful diagnostic here because the industry has always been an early signal for how the broader marketing world processes technological disruption. We were the first to chase algorithmic shortcuts at scale. We were the first to industrialize content in ways that traded quality for volume. And right now we are watching two distinct practitioner populations diverge in real time, with the gap between them widening faster than most people have noticed. The first population is using LLMs as answer machines: feed the problem in, take the output out, ship it. Ask the model what’s wrong with a site’s rankings. Ask it to write the content strategy. Ask it to explain why traffic dropped. This isn’t entirely without value, since Layer 1 retrieval has genuine utility even here, but the practitioners operating purely at this layer are making a trade they may not fully understand yet. They are outsourcing the only part of the job that compounds in value over time. Every hard problem they hand off to a model without first attempting to reason through it themselves is a training repetition they didn’t take, a weight they didn’t lift, and those repetitions are how Layer 3 gets built. You want the muscle? You have to do the work. The second population is using LLMs as reasoning partners. They come to the model with a hypothesis already formed, a question already sharpened by their own thinking, and they use the output to pressure-test their reasoning, surface considerations they may have missed, and accelerate the parts of the work that don’t require their hard-won judgment, which frees them to apply that judgment more deliberately where it matters. These practitioners are getting faster and better simultaneously, because the model is amplifying something that already exists. The difference between these two groups has nothing to do with tool access, since they are using the same tools, and everything to do with what each practitioner brings to the model before they open it. The leveling lie The argument for AI as a leveling tool is not wrong, it’s just incomplete, and that incompleteness is where the damage happens. A junior practitioner today has access to a compression of the field’s knowledge that would have been unimaginable five years ago. Ask an LLM about crawl budget allocation, entity relationships, structured data implementation, or the mechanics of how retrieval-augmented systems weight freshness signals, and you will get a coherent, usually accurate answer in seconds. That is a genuine democratization of Layer 1, and dismissing it as illusory is its own form of gatekeeping. But Layer 1 access is not expertise. It is the vocabulary of expertise, and there is a specific kind of danger in having the vocabulary before you have the understanding, because fluency masks the gap. You can discuss the concepts. You can deploy the terminology correctly. You can produce output that looks like the work of someone with deep experience, and you can do all of that while having no independent capacity to evaluate whether what you just produced is actually right for the situation in front of you. This is not a character flaw but a metacognitive failure, the condition of not knowing what you don’t yet know. The junior practitioner using an LLM to accelerate their access to field knowledge isn’t being lazy. In many cases they are working hard and genuinely trying to develop. The problem is that Layer 1 fluency generates a confidence signal that isn’t calibrated to actual capability. The model doesn’t tell you when you’ve hit the edge of what it knows. It doesn’t flag the situations where the standard answer breaks down. It doesn’t know what it doesn’t know either, and neither do you yet, and that combination is where well-intentioned work quietly goes wrong. The leveling effect is real, but the ceiling on it is lower than most people assume. What gets leveled is access to the knowledge layer. What doesn’t get leveled (what cannot be compressed or transferred through any tool) is the judgment architecture that determines what you do with that knowledge when the situation doesn’t follow the pattern. The practitioners who understand this distinction will use AI to accelerate their development. The ones who don’t will use it to feel further along than they are, right up until the moment a genuinely novel problem requires something they haven’t built yet. Where the abdication actually happens Let’s be precise about this, because the accusation of abdication usually gets thrown around in ways that are more emotional than useful. Using AI at Layer 1 is not abdication. Letting a model handle competitive analysis synthesis, first-draft content frameworks, technical audit pattern recognition, or structured data generation is correct delegation, since these are retrievable tasks and doing them manually when a better tool exists isn’t intellectual virtue but inefficiency pretending to be rigor. Abdication happens at a specific and different point. It happens when you stop taking the problems that would have built your Layer 3 judgment and start routing them directly to a model instead: not because the model’s output isn’t useful, but because the attempt itself was the point. The struggle to formulate an answer to a hard problem, even an incomplete or wrong answer, is the mechanism by which judgment gets built. Hand that struggle off consistently and you are not saving time but spending something you may not realize you’re spending until it’s gone. This is the part of the conversation that doesn’t get said clearly enough: the low-consequence training repetitions are how you prepare for the high-consequence moments. A practitioner who has reasoned through hundreds of traffic anomalies, content decay patterns, and crawl architecture decisions (even inefficiently, even wrongly at first) has built something that cannot be replicated by having asked an LLM to reason through those same problems on their behalf, because the model’s reasoning is not your reasoning, just as watching someone else lift the weight does not build your muscle. The senior practitioners who feel their position eroding right now are often misdiagnosing the threat. The threat isn’t that AI makes their knowledge less valuable, since genuine Layer 3 judgment is actually more valuable in an AI-saturated environment, not less, precisely because it becomes rarer as more people mistake Layer 1 fluency for the whole stack. The real threat is that the market hasn’t developed clean signals yet for distinguishing Layer 3 capability from Layer 1 fluency dressed up convincingly. It’s a signal problem that is temporary and will resolve itself in the most public and consequential ways possible - in front of clients, in front of leadership, in front of the situations where someone needs to make a call the model can’t make. The answer for experienced practitioners is not to resist AI but to use it in ways that continue building Layer 3 rather than substituting for it. Use the model to go faster on Layer 1, and use the time that buys you to take on harder problems at Layer 2 and 3 than you could have reached before. The ceiling on your development just got higher, and whether you use that is a choice. The answer for junior practitioners is harder but more important: understand that the shortcut doesn’t shorten the path but changes the surface underfoot. You can move across the terrain faster with better tools, but the terrain still has to be crossed, and there is no prompt that builds the judgment architecture for you. Only doing the work, being wrong in situations that matter, and carrying that forward builds that. The prerequisite Critical thinking is not the alternative to AI use. Instead, it is the prerequisite for AI use that compounds. Without it, you are operating entirely at Layer 1, fluent and fast and increasingly indistinguishable from everyone else who has access to the same tools you do, and everyone has access to the same tools you do. The tools are not the differentiator and never were, serving instead as a floor, and that floor is rising under everyone’s feet simultaneously. What compounds is judgment. The accumulated capacity to ask better questions than the person next to you, to recognize the moment when the standard pattern breaks, to hold a strategic position steady when the data is ambiguous and the pressure is real. That capacity doesn’t live in the model but in the practitioner, built over time through deliberate practice under real conditions, and it is the only thing in The Judgment Stack that gets more valuable as the tools get better. The interview rooms where qualified candidates go quiet when asked to reason out loud are not showing us a technology problem. They are showing us what happens when a generation of practitioners optimizes for Layer 1 output without building the infrastructure underneath it, accumulating the vocabulary without the architecture and the fluency without the foundation. The practitioners who will matter in three years are building that foundation right now, using every tool available to go faster at Layer 1 and using the time that buys them to go deeper at Layer 3 than was previously possible. They are not choosing between AI and thinking but using AI to think harder than they could before, and that is not a leveling effect but a compounding one…and compounding, as anyone who has spent serious time in this industry understands, is an advantage worth building. If this maps to something you’re seeing in your team or organization (practitioners who look capable until the pressure is real, capability gaps you can’t quite locate) drop a comment below or reach out. I’d like to hear what you’re observing. For the broader framework of how AI is reshaping brand visibility, trust, and discovery, The Machine Layer [https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1] expands this thinking across the full landscape. Available on Amazon. Thanks for reading! This post is public so feel free to share it. Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

Kommentarer

Vær den første til at kommentere

Tilmeld dig nu og bliv en del af Duane Forrester Decodes-fællesskabet!

Kom i gang

Alle episoder

8 episoder

AI Search Runs on Two Memory Systems. The Platforms Don’t Use Them the Same Way.

Referenced in this episode: When the Training Data Cutoff Becomes a Ranking Factor (Duane Forrester Decodes) https://duaneforresterdecodes.substack.com/p/when-the-training-data-cutoff-becomes [https://duaneforresterdecodes.substack.com/p/when-the-training-data-cutoff-becomes] The companion piece this episode builds on, where I first laid out the parametric-versus-retrieval distinction and what it means for timing. How Perplexity finds and chooses its sources (Search Engine Journal) https://www.searchenginejournal.com/perplexity-ai-interview-explains-how-ai-search-works/565395/ [https://www.searchenginejournal.com/perplexity-ai-interview-explains-how-ai-search-works/565395/] Background on why Perplexity runs a live search on essentially every query rather than answering from memory. Google's AI optimization guidance, and why AI Search is still Search (DemandSphere) https://www.demandsphere.com/blog/google-ai-optimization-guide-ai-search-is-still-search/ [https://www.demandsphere.com/blog/google-ai-optimization-guide-ai-search-is-still-search/] Support for the point that AI Overviews and AI Mode are served off the core Search index, not from Gemini's parametric memory. Claude web search tool documentation (Anthropic) https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool [https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool] Primary source showing Claude's web search runs as a tool the model invokes only when it decides a question needs it. Manage public web access in Microsoft 365 Copilot (Microsoft Learn) https://learn.microsoft.com/en-us/microsoft-365/copilot/manage-public-web-access [https://learn.microsoft.com/en-us/microsoft-365/copilot/manage-public-web-access] The admin control behind the point that, on Copilot, whether retrieval happens at all can be a tenant policy setting. Stop Treating AI Visibility as One Problem (Duane Forrester Decodes) https://duaneforresterdecodes.substack.com/p/stop-treating-ai-visibility-as-one [https://duaneforresterdecodes.substack.com/p/stop-treating-ai-visibility-as-one] The earlier governed-visibility piece this episode zooms into, treating retrieval as one of three layers to manage. ChatGPT search behavior, clickstream insights (Semrush) https://www.semrush.com/blog/chatgpt-search-insights/ [https://www.semrush.com/blog/chatgpt-search-insights/] The study behind the stat that ChatGPT's share of search-triggering sessions swung between roughly 15 and 66 percent as models updated. Lost in the Middle: How Language Models Use Long Contexts (arXiv) https://arxiv.org/abs/2307.03172 [https://arxiv.org/abs/2307.03172] The foundational research on models using long context unevenly, behind the point that being retrieved isn't the same as being used well. How up to date is ChatGPT, and how knowledge cutoffs work (JustDone) https://justdone.com/blog/ai/how-up-to-date-is-chatgpt [https://justdone.com/blog/ai/how-up-to-date-is-chatgpt] Context for the training-cadence point that providers now ship frequent point releases, each carrying its own cutoff. The Machine Layer (Amazon) https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1 [https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1] My book, for the longer argument on why visibility, trust, and machine-readability are converging into one problem. Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

I går14 min

You Can Finally Measure Content Alignment. That’s the Dangerous Part.

References from this episode: When I mentioned Gerard Salton's SMART system at Cornell, the foundational vector space model work from the 1960s, here's the background on that: https://en.wikipedia.org/wiki/SMART_Information_Retrieval_System [https://en.wikipedia.org/wiki/SMART_Information_Retrieval_System] The Netflix study I referenced on cosine similarity producing arbitrary results in embedding models, that's the 2024 paper from Steck, Ekanadham, and Kallus: https://research.netflix.com/publication/is-cosine-similarity-of-embeddings-really-about-similarity [https://research.netflix.com/publication/is-cosine-similarity-of-embeddings-really-about-similarity] The MTEB benchmark leaderboard, where you can see the performance spread across current embedding models: https://huggingface.co/spaces/mteb/leaderboard [https://huggingface.co/spaces/mteb/leaderboard] Goodhart's Law, the "when a measure becomes a target it ceases to be a good measure" concept: https://en.wikipedia.org/wiki/Goodhart%27s_law [https://en.wikipedia.org/wiki/Goodhart%27s_law] The vector index hygiene piece I referenced from last year: https://duaneforresterdecodes.substack.com/p/vector-index-hygiene-a-new-layer [https://duaneforresterdecodes.substack.com/p/vector-index-hygiene-a-new-layer] The written version of this episode is available as the full article on this same Substack. Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

31. maj 202619 min

Data for Decisions and Evidence for your conversations.

The Machine Layer - available at Amazon now. [https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1] Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

26. maj 202618 min

You’re Using AI at the Execution Layer. The Value Is in the Judgment Layer.

The tools are deployed. The licenses are paid. And if you’re a senior SEO or GEO practitioner right now, you’re probably using AI every day - for drafts, for summaries, for first passes at content that used to take twice as long. That’s real productivity, and it’s not nothing. It’s also not the return the investment is capable of producing. And the gap between what you’re getting and what’s available isn’t a tool problem. It’s a mode problem. A peer-reviewed study published at the 2025 ASIS&T Annual Meeting [https://asistdl.onlinelibrary.wiley.com/doi/10.1002/pra2.1253] by Tim Gorichanaz at Drexel University gives that problem a name (h/t to Shari Thurow [https://www.linkedin.com/in/shari-thurow/] for pointing me at this paper!). Analyzing 205 real-world ChatGPT use cases, Gorichanaz identified six distinct modes in which people actually use AI: Writing, Deciding, Identifying, Ideating, Talking, and Critiquing. The data came from Reddit and skews Anglophone, which limits its generalizability, but the taxonomy it produced maps uncomfortably well onto how most practitioners are actually working. Two modes dominate. Four are being left on the table. The four being left are the ones that determine whether AI makes you more strategically valuable or just faster at execution-layer work. That distinction matters more right now than it has at any prior point in this industry’s history. The Two Modes Everyone Defaults To Writing was the largest category in Gorichanaz’s data at 47% of observed use cases - drafting, editing, summarizing, translating, generating. McKinsey’s 2025 State of AI survey [https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai] confirms this at the enterprise level: the most commonly reported AI use cases are content drafting and information capture, and 63% of organizations using generative AI apply it primarily to create text. Identifying - explaining something, answering a factual question, summarizing a document - was another 10% of the study’s data, and represents the other pillar most practitioners have built their AI workflow around. Research a topic, get a synthesis, move to the next task. Together these two modes account for the overwhelming majority of how AI is being used, by practitioners and enterprises alike. Both have real value, yet neither is where the leverage is. And if your AI practice begins and ends there, you’re using an increasingly sophisticated tool to do work that was already being automated - just faster and at higher volume. The other four modes (Deciding at 21% of Gorichanaz’s sample, Ideating at 9%, Talking at 8%, and Critiquing at 6%) are where the work becomes irreplaceable. They’re also where almost no practitioner has built a deliberate workflow, because nobody handed them one, and the pressure to show immediate output has consistently crowded out the space to develop one. The Decisions You’re Still Making Alone In the practitioner’s week, Deciding-mode questions are everywhere: which queries actually have AI visibility exposure worth prioritizing right now, whether a brand’s retrieval problem is a content architecture problem or a sourcing and signal problem, how to allocate effort across a portfolio when both SEO and GEO need attention and the budget doesn’t stretch to cover both fully, when to escalate a visibility concern to leadership versus when to fix it in the work before anyone asks. Most senior practitioners are currently solving these questions with experience and intuition. That’s not a failure as experience and intuition are genuinely valuable, and no AI replaces them. But AI used deliberately in Deciding mode adds something experience can’t provide on its own: a structured pressure-test of the assumptions underneath the decision, applied before the decision hardens. That requires more than a good question. Deciding mode requires giving the AI the relevant context (competitive landscape, current visibility posture, historical performance, strategic constraints) and then treating what comes back as a genuine input to the decision rather than a draft to be skimmed and set aside. It requires a workflow that doesn’t yet exist in most practitioners’ practice, not because anyone blocked it, but because no one built the time or structure for it either. The same McKinsey data makes clear what that gap costs at scale: 88% of organizations use AI, but only 6% qualify as high performers generating meaningful enterprise-wide impact, and high performers are 3.6 times more likely to have fundamentally reworked their workflows rather than simply deployed tools into existing ones. The pattern holds at the practitioner level. Faster output from an unreconstructed workflow is not the same thing as better decisions from a restructured one. The Gaps Nobody Briefed For SEO and GEO practitioners, Ideating mode has a specific application that most are not using and most should be: mapping the entity and authority gaps the brand hasn’t recognized yet. What angles of topical authority has the brand failed to establish that AI retrieval systems are currently filling from other sources? What community signals (forum discussions, aggregated reviews, third-party commentary) are shaping how LLMs represent the brand in response to category queries, and what would it take to shift them? What framings of the brand exist in model training data that the brand’s own content has never addressed or countered? These are genuinely Ideating-mode questions. They’re also questions most practitioners have some version of in the back of their mind without a structured method for surfacing the answers. AI used in Ideating mode, not “give me five content ideas” but a genuine iterative exploration with deliberate constraints and real willingness to follow the output somewhere the team hasn’t already been, is one of the most direct methods available for finding those gaps before a competitor or a client audit finds them first. The barrier isn’t capability. It’s the difference between a Writing prompt with a list output and an actual Ideating session. The first takes two minutes. The second takes twenty, requires a different posture toward the tool, and produces something that can’t be replicated by anyone who didn’t do it. That asymmetry is where practitioner value gets built in the current environment, and most practitioners are not claiming it. The Honest Read Your Team Won’t Give You This is the mode with the most direct application to daily practice and the most organizational resistance, because it requires using AI to find problems in work the practitioner or their team has already invested in. Used properly, Critiquing is how a senior practitioner catches what internal review missed. The weak entity claim in a content strategy that sounds authoritative but isn’t backed by the sourcing AI retrieval systems actually trust. The gap between what the brand says about itself across owned properties and what a well-prompted LLM surfaces when asked a category question the brand should own. The assumed premise in a GEO recommendation that made sense six months ago and is now contradicted by how retrieval patterns have shifted. That last application is not abstract. Running your own brand (or a client’s brand) through a structured AI Critiquing session before the next strategy cycle is exactly the kind of proactive work that separates practitioners operating at the judgment layer from practitioners operating at the production layer. It’s also the kind of work that changes the conversation with a client or a leadership team, because you’re surfacing problems before they become visible in the data rather than explaining them after the fact. The reason Critiquing is underused isn’t a governance problem. It’s a disposition problem. Organizations and practitioners have broadly trained themselves to use AI to produce output, not to interrogate it. Reversing that habit is a choice, and it’s one of the more consequential choices available to a senior practitioner right now. Rehearsal The Talking mode in Gorichanaz’s taxonomy covers AI as a conversation partner, and for practitioners, the most valuable version of that is rehearsal for the internal and client conversations where the stakes are real. The client call where you have to explain why organic traffic is down 30% while AI search visibility is also poor, and you need to hold two separate causal explanations simultaneously without letting them collapse into a single narrative that oversimplifies both. The internal briefing where you have to make the case for GEO investment alongside existing SEO budget to a leadership team that still conflates the two disciplines and wants a single number that explains the ROI of both. The agency or vendor review where you need to push back on a recommended approach without losing the relationship. These conversations are recurring and high-stakes, and most practitioners walk into them with only their own mental rehearsal as preparation. Talking mode (role-playing the pushback, asking the AI to argue the other side, running through the version of the conversation that goes wrong) is not a replacement for experience. It is a preparation method that costs twenty minutes and materially changes the quality of the practitioner who walks into the room. It doesn’t produce an artifact. It doesn’t show up in a utilization report. EY’s 2025 Work Reimagined Survey [https://www.ey.com/en_us/insights/workforce/work-reimagined-survey], which covered 15,000 employees and 1,500 employers across 29 countries, found that 88% of employees use AI at work, but only 5% use it in ways that fundamentally transform what they produce. The reason that gap is so wide is almost certainly that the advanced modes - Critiquing, Deciding, Talking - don’t produce something measurable in the moment. They produce a better practitioner over time, which is a return that compounds and doesn’t appear in a dashboard. What Mode You’re In Is What Layer You’re On The six-mode taxonomy maps almost exactly onto the split between execution-layer work and judgment-layer work. Writing and Identifying are execution-layer modes. They’re valuable, they’re visible, and they’re increasingly the modes that AI handles with less and less human involvement. Deciding, Ideating, Critiquing, and Talking are judgment-layer modes. They’re where the practitioner’s irreplaceability lives. A senior SEO or GEO practitioner who uses AI only in Writing and Identifying mode is, functionally, positioning themselves as an execution-layer worker at exactly the moment when AI is most aggressively compressing that layer. That’s not a prediction about job displacement. It’s an observation about professional differentiation. The practitioners building durable value in this environment are the ones using AI to make their judgment better, not just their output faster. Gorichanaz’s study reframes what information need actually means in the AI era, not just question-answering or uncertainty reduction, but what the authors call skillfully coping in the world, meaning the ongoing application of practical intelligence to situations requiring both understanding and action. For a senior practitioner, that framing is a useful diagnostic. The question isn’t what AI can do. It’s which parts of your work require the kind of practical intelligence that compounds with experience, and whether your current AI practice is making that intelligence sharper or just making everything around it move faster. McKinsey’s workplace research [https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work] finds that only 1% of leaders call their companies mature on AI deployment, meaning AI is fully integrated into workflows and driving substantial business outcomes. The practitioner-level version of that gap is just as wide, and just as fixable. If you mapped your actual AI usage against the six modes this week (not what you intend to do, what you actually did) how would the distribution look? How much was Writing and Identifying? How much was Deciding, Ideating, Critiquing, Talking? The practitioners who close that gap deliberately, who build even a minimal workflow around the judgment-layer modes, are not doing something exotic. They’re doing something most of their peers are not. In a discipline where the execution layer is getting compressed by the same tools everyone has access to, that gap is the one worth closing first. If this framing connects to work you’re navigating, I’d like to hear about what you’re seeing. And if you want to go deeper on the structural layer beneath all of this, The Machine Layer [https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1] is where that conversation continues. And before you go, just a heads up that I have a special announcement coming on Tuesday this week, just 2 days from now! You’ll get an extra email and podcast this week. Thanks for your time, everyone, and I’ll be back soon. Thanks for reading! This post is public so feel free to share it. Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

24. maj 202617 min

LLM Guidance Doesn’t Port the Way SEO Guidance Did

For roughly two decades, the SEO discipline operated on a quiet assumption that turned out to be one of its most valuable features. Guidance from one search engine traveled. If Google said sitemaps mattered, Bing said sitemaps mattered. If Bing said structured data deserved real effort, Google said the same. Practitioners optimized for Google with reasonable confidence that the work would carry across the other engines, and most of the time it did. That portability was not luck. It was the product of a structurally large overlap layer that the major search engines had jointly built, brick by brick, over twenty years. That world doesn’t exist in LLM-land. The major providers train on different corpora, run different crawlers under different policies, route different queries through different retrieval systems, and apply different alignment processes that shape the final response in ways the upstream signals can’t predict. Guidance from any one provider, including Google’s guidance about its own Gemini products, is one data point. Practitioners carrying the SEO habit forward, the habit of treating one engine’s guidance as roughly the whole map, will optimize confidently for one platform and miss the others. Sidebar: As I was finalizing this piece, Google published fresh guidance on optimizing for their generative AI features [https://developers.google.com/search/docs/fundamentals/ai-optimization-guide]. Their framing is explicit: from Google Search's perspective, optimizing for AI search is still SEO. That framing is accurate for Google Search. It does not extend to ChatGPT, Claude, Perplexity, or any other LLM, and that is precisely the trap this article is about. The shared standards that made SEO guidance portable The era of portable guidance was built on actual collaboration, not coincidence. The Sitemaps protocol [https://www.sitemaps.org/] became the joint property of Google, Yahoo, and Microsoft in November 2006, when the three engines formally agreed to support a common protocol at version 0.90, building on Google’s earlier Sitemaps 0.84 from June 2005. Five years later, on June 2, 2011, the same three engines launched Schema.org [https://blogs.bing.com/search/June-2011/Introducing-Schema-org-Bing,-Google-and-Yahoo-Uni], with Yandex joining shortly after, to create a common vocabulary for structured data markup. That was the announcement that got made on stage at SMX Advanced. I was on the Bing team at the time, and what struck me then is what still matters now. The engines were competitors, but they had decided that a shared vocabulary served them all. Webmasters got one set of rules. The web got cleaner data. The engines got better signals. Everybody won. The pattern repeated with robots.txt [https://www.rfc-editor.org/rfc/rfc9309.html], the 1994 convention that became RFC 9309 at the IETF in 2022, formalizing what every serious crawler already honored. And it repeated again, more recently, with IndexNow [https://www.indexnow.org/], the protocol Microsoft Bing and Yandex launched in October 2021. IndexNow is now supported by Bing, Yandex, Naver, Seznam, and Yep. Google has tested the protocol since 2021 but has not adopted it. That overlap layer is exactly why Google’s guidance felt safe to follow even if you cared about Bing traffic. The signals the engines used were not identical, but the inputs they accepted, the protocols they honored, and the standards they advertised were. Optimization had a shared substrate. Where the LLM stacks actually diverge The LLM environment doesn’t have a shared substrate of comparable size. The differences are not cosmetic and they are not temporary. They are baked into how the systems are built. Start with training data. OpenAI has signed disclosed licensing deals with News Corp worth up to $250 million over five years [https://everything-pr.com/ai-licensing-tracker/], Axel Springer [https://openai.com/index/axel-springer-partnership/] at roughly $13 million per year, Reddit [https://www.cjr.org/analysis/reddit-winning-ai-licensing-deals-openai-google-gemini-answers-rsl.php] at an estimated $70 million per year, plus the Financial Times, Condé Nast, Hearst, Vox Media, The Atlantic, the Associated Press, Le Monde, and others. Google has its own Reddit deal estimated at $60 million per year granting real-time data API access. Anthropic has not publicly disclosed equivalent publisher licensing deals, and that undisclosed status is itself the practitioner-facing point. The corpora that fed these models, and that continue to refresh them, are not the same documents. Practitioners cannot know what any given provider has paid for and what it hasn’t. The crawler infrastructure diverges next. OpenAI runs three separate bots [https://platform.openai.com/docs/bots]: GPTBot for training, OAI-SearchBot for search indexing, and ChatGPT-User for user-initiated retrieval. Anthropic runs three of its own [https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler]: ClaudeBot for training, Claude-SearchBot for search, and Claude-User for user-initiated retrieval. Perplexity runs PerplexityBot and Perplexity-User. Google introduced Google-Extended in September 2023 as the user-agent that controls whether Google can use a site’s content to train Gemini, separate entirely from the Googlebot that handles traditional search indexing. There is no single AI user-agent. Every provider requires a separate rule, and the rules don’t translate cleanly across providers because the bots don’t do equivalent jobs in equivalent ways. The retrieval architectures diverge structurally. ChatGPT has historically used Bing’s index [https://yoast.com/chatgpt-search/] as its primary web search source, and that connection appears to still be primary, though OpenAI continues to build out additional infrastructure alongside it. Perplexity built its retrieval system on a Vespa-based pipeline that treats documents and sub-document chunks as first-class retrievable units. Google’s Gemini uses Google’s own index plus Knowledge Graph grounding. Claude uses Brave Search as a retrieval partner. Same query, four different retrieval systems, four different views of which sources exist and which sources are worth surfacing. Then comes the alignment layer, which is where SEO had no equivalent at all. After a model is trained on its corpus, providers run post-training to shape how the model actually behaves: tone, refusal patterns, format, safety posture, what counts as a good answer. OpenAI’s primary approach has been RLHF, or Reinforcement Learning from Human Feedback [https://arxiv.org/abs/2203.02155], where human raters score model outputs and the model learns to produce highly rated responses. Anthropic developed Constitutional AI, which trains models to critique and revise their own outputs against a written set of principles. These methodologies produce demonstrably different behavior in the final products. The same retrieved content, fed into two models aligned by two methodologies, can yield two materially different responses about the same brand. When one provider’s guidance demonstrably fails to port The clearest single example of guidance that doesn’t port is llms.txt [https://llmstxt.org/]. Jeremy Howard of Answer.AI proposed the file in September 2024 as a markdown manifest, placed at a site’s root, that would guide LLMs to the most important content. The proposal got picked up across the SEO community. Yoast built a generator. Agencies added llms.txt creation to their service catalogs. Conference speakers declared it essential. As of mid-2026, no major LLM provider has confirmed they consume the file [https://ahrefs.com/blog/what-is-llms-txt/]. Not OpenAI. Not Anthropic. Not Google. Server-log analyses across hundreds of thousands of domains show major AI crawlers don’t routinely request /llms.txt at all. Google’s John Mueller publicly compared it to the deprecated meta keywords tag [https://www.cshel.com/ai-seo/no-llms-txt-is-not-the-new-meta-keywords/]. Gary Illyes confirmed at Search Central Live in July 2025 that Google does not support llms.txt and is not planning to. I’ve written about this elsewhere [https://duaneforresterdecodes.substack.com/p/llmstxt-the-webs-next-great-idea], so I won’t repeat the technicalities here. What matters for this argument is the structural lesson. Schema.org succeeded because three engines built it together and then enforced it together. llms.txt was proposed by one researcher, picked up by tooling vendors, and ignored by the platforms it was supposed to serve. The shared-standards model that gave SEO its portable guidance is not available to LLM practitioners at the same scale, because the platforms are not building the standards together. They are building their own pipelines. The Gemini inversion The cleanest illustration of how far guidance portability has degraded sits inside one company. Google publishes its own SEO documentation [https://developers.google.com/search/docs] at Search Central, the canonical guidance the industry has followed for two decades. Those documents emphasize traditional ranking signals, E-E-A-T, content quality, technical accessibility, and structured data. That guidance is still useful for Google Search itself. Google also makes Gemini, the model that powers AI Overviews and Google’s separate AI Mode surface. And the citation behavior of those surfaces does not appear to track the guidance the same company publishes for its own search results. In late 2024, roughly three-quarters of pages cited in AI Overviews [https://ahrefs.com/blog/ai-overview-citations-top-10/] also ranked in Google’s top 12 for the same query. By early 2026, after Google upgraded AI Overviews to Gemini 3 in January, Ahrefs analyzed 4 million AI Overview URLs and found that only 38% of cited pages also appeared in the top 10 for the same query. A separate BrightEdge analysis [https://www.searchenginejournal.com/google-ai-overview-citations-from-top-ranking-pages-drop-sharply/568637/] put the overlap closer to 17%. SE Ranking’s post-upgrade work found that Gemini 3 replaced approximately 42% of the domains previously cited under earlier model versions and generates 32% more sources per response. The gap widens further when you look at Google’s AI Mode, which is a separate conversational surface that runs on the same Gemini family. SEMRush data [https://whitehat-seo.co.uk/blog/ai-engines-comparison-citations] shows AI Mode and AI Overviews reach semantically similar conclusions 86% of the time, but cite the same URLs only 13.7% of the time. Only 14% of AI Mode citations rank in Google’s traditional top 10. It appears, so far, that the canonical relationship has shifted. Google’s published SEO guidance is still the cleanest path to ranking in Google Search. But that ranking is no longer a reliable proxy for being cited by Google’s own AI surfaces. The same guidance, the same content, the same domain, can produce three meaningfully different outcomes across Google Search, AI Overviews, and AI Mode, even though all three live inside the same company. The old playbook of following the search engine’s guidance and trusting that the engine’s other surfaces would behave consistently does not appear to be delivering the same returns it used to. What still ports, and why it’s smaller than it looks A universal layer does survive. Crawler accessibility still matters across every provider. Primary-source factual content still wins more citations than aggregator restatement. Clean retrievable structure still helps every system understand what a page is about. Presence on the high-authority sources that all major LLMs disproportionately cite, Wikipedia, YouTube, Reddit, major news outlets, still functions as a force multiplier across platforms. Earning visibility on those sources gives content a chance to surface in any LLM that draws on them. But the universal layer is much smaller than it was in the SEO era. Qwairy’s analysis of 118,000 AI responses [https://whitehat-seo.co.uk/blog/ai-engines-comparison-citations] across ChatGPT, Perplexity, Google AI Mode, and Claude found that only 11% of cited domains appeared across multiple platforms. The other 89% were platform-specific. A brand that wins citations on Perplexity may be largely invisible on Claude. A brand that’s a regular reference on ChatGPT may not show up in AI Overviews at all. The same content can be the right answer for one system and the wrong answer for the system next to it. What this means for the work The practical implication is not abandoning all hope. It is that practitioners need to stop treating any single LLM provider’s guidance as the universal map and start treating it as one input among several. Read what every major provider publishes about their own systems. Test your visibility across platforms, not just on the platform you happen to use most. Treat divergence as the default and overlap as the exception, not the other way around. This is not how SEO worked, and the difference matters. The old reflex was to optimize for Google and trust the portability. The new reality is that following one LLM’s guidance, even Google’s guidance about Gemini, will leave you optimized for a slice of the landscape and potentially blind to the rest. The discipline is being rebuilt on platform-specific work that didn’t exist in the SEO era, and the practitioners who recognize that first are going to spend the next two years setting the standards everyone else follows. The overlap shrunk. You now have more work than ever to accomplish. If you have thoughts on where the divergence between providers is sharpest in your own work, drop a comment below or reach out directly. I’d genuinely like to hear what’s showing up in the data. For more on how the AI search environment is reshaping the practitioner discipline, The Machine Layer [https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1] is on Amazon. Thanks for reading! This post is public so feel free to share it. Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

17. maj 202617 min

AI Gives You the Vocabulary. It Doesn’t Give You the Expertise.

Beskrivelse

Kommentarer

1 måned kun 9 kr.

Alle episoder