Duane Forrester Decodes

Rank and AI Citation Aren’t the Same Number

16 min · 14. juni 2026
episode Rank and AI Citation Aren’t the Same Number cover

Beskrivelse

Resources and references Query length, prompts vs. searches: SimilarWeb data on how much longer AI prompts run than Google queries: https://officechai.com/ai/chatgpt-queries-17x-longer-than-google-searches-6x-longer-than-googles-ai-mode-similarweb-data/ [https://officechai.com/ai/chatgpt-queries-17x-longer-than-google-searches-6x-longer-than-googles-ai-mode-similarweb-data/] Clickstream analysis on the gap between the typed prompt and the search the model actually fires: https://martech.org/chatgpt-growing-as-a-traffic-referrer-reshaping-search-behavior-report-says/ [https://martech.org/chatgpt-growing-as-a-traffic-referrer-reshaping-search-behavior-report-says/] Study on prompt decomposition, multiple retrieval searches per prompt: https://searchengineland.com/chatgpt-search-prompts-data-463407 [https://searchengineland.com/chatgpt-search-prompts-data-463407] Longtail and ranking: Why longtail is about specificity and search volume, not word count: https://www.yotpo.com/blog/long-tail-keywords-guide/ [https://www.yotpo.com/blog/long-tail-keywords-guide/] On long, specific phrases being easier to rank for at modest authority: https://www.w3era.com/blog/seo/long-tail-keyword-strategy/ [https://www.w3era.com/blog/seo/long-tail-keyword-strategy/] On reading search volume as a starting point, not a verdict: https://www.outrank.so/blog/how-to-find-low-competition-keywords [https://www.outrank.so/blog/how-to-find-low-competition-keywords] Citation vs. organic overlap: Moz, on most AI Mode citations not appearing in the organic results for the same query: https://thenextweb.com/news/ai-changing-seo-tools [https://thenextweb.com/news/ai-changing-seo-tools] ZipTie, on how few cited URLs land in Google's top ten: https://ziptie.dev/blog/how-different-ai-platforms-cite-the-same-source-differently/ [https://ziptie.dev/blog/how-different-ai-platforms-cite-the-same-source-differently/] Semrush AI Mode study, including heavy Perplexity-Google overlap: https://www.semrush.com/blog/ai-mode-comparison-study/ [https://www.semrush.com/blog/ai-mode-comparison-study/] How input shape moves what gets surfaced: Comparative analysis, AI sourcing shifting with the character of the query: https://arxiv.org/abs/2601.16858 [https://arxiv.org/abs/2601.16858] Study on outputs shifting when prompts are rephrased: https://arxiv.org/abs/2509.08919 [https://arxiv.org/abs/2509.08919] Book: The Machine Layer: https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1 [https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1] Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

Kommentarer

0

Vær den første til at kommentere

Tilmeld dig nu og bliv en del af Duane Forrester Decodes-fællesskabet!

Kom i gang

1 måned kun 9 kr.

Derefter 99 kr. / måned · Opsig når som helst.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

Alle episoder

11 episoder

episode Microsoft Just Proved a Point About Search Today cover

Microsoft Just Proved a Point About Search Today

Show Notes Microsoft Just Proved a Point About Search Today Announcing Microsoft Web IQ — Bing Search Blog https://blogs.bing.com/search/June-2026/Announcing-Microsoft-Web-IQ [https://blogs.bing.com/search/June-2026/Announcing-Microsoft-Web-IQ] Introducing AI Performance in Bing Webmaster Tools (public preview) — Bing Webmaster Blog https://blogs.bing.com/webmaster/February-2026/Introducing-AI-Performance-in-Bing-Webmaster-Tools-Public-Preview [https://blogs.bing.com/webmaster/February-2026/Introducing-AI-Performance-in-Bing-Webmaster-Tools-Public-Preview] New AI Visibility Insights in Bing Webmaster Tools: Intents, Topics, Citation Share, and Compare — Bing Search Blog https://blogs.bing.com/search/June-2026/New-AI-Visibility-Insights-in-Bing-Webmaster-Tools-Intents-Topics-Citation-Share-Compare [https://blogs.bing.com/search/June-2026/New-AI-Visibility-Insights-in-Bing-Webmaster-Tools-Intents-Topics-Citation-Share-Compare] Jordi Ribas on Web IQ and how AI agents search — Search Engine Land https://searchengineland.com/microsoft-releases-web-iq-powered-by-bing-but-designed-for-how-ai-agents-search-479194 [https://searchengineland.com/microsoft-releases-web-iq-powered-by-bing-but-designed-for-how-ai-agents-search-479194] Microsoft Web IQ gives AI agents Bing grounding APIs — Search Engine Journal https://www.searchenginejournal.com/microsoft-web-iq-gives-ai-agents-bing-grounding-apis/577736/ [https://www.searchenginejournal.com/microsoft-web-iq-gives-ai-agents-bing-grounding-apis/577736/] The evolving role of the index: from ranking pages to supporting answers — Bing Search Blog https://blogs.bing.com/search/May-2026/Evolving-role-of-the-index-From-ranking-pages-to-supporting-answers [https://blogs.bing.com/search/May-2026/Evolving-role-of-the-index-From-ranking-pages-to-supporting-answers] Rank and AI Citation Aren’t the Same Number — Duane Forrester Decodes https://duaneforresterdecodes.substack.com/p/rank-and-ai-citation-arent-the-same [https://duaneforresterdecodes.substack.com/p/rank-and-ai-citation-arent-the-same] The Machine Layer (book) — Amazon https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1 [https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1] Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

28. juni 202616 min
episode 81.8% of My “AI Assistant” Traffic Was Fake. The Googlebot Number Was Worse. cover

81.8% of My “AI Assistant” Traffic Was Fake. The Googlebot Number Was Worse.

Show Notes: 81.8% of My “AI Assistant” Traffic Was Fake Episode summary Over two weeks, a brand-new website with zero promotion behind it logged thirty-three visits from AI assistants. Only six were real. The rest were lying about who they were, and the Googlebot numbers were worse. In this episode I walk through exactly what I found in my own server logs, how I proved each finding past the point of doubt, and the simple method you can run on your own logs this week to see your real numbers. We cover why a bot’s name is a claim and not an identity, the difference between bots that fetch you for a live answer and bots that crawl you to train tomorrow’s models, the one crawler I had to chase four different ways to nail down, and the one major player you structurally cannot measure at all. What you’ll learn: •     Why the bot names in your analytics are a “claims to be” number, not a real one, and the one check that fixes it. •     The 81.8 percent spoof rate hiding in live AI-assistant traffic, and how the fakes gave themselves away. •     Why Googlebot showed 87 percent impersonation, and why that is an old story, not a new one. •     The difference between retrieval crawlers (today’s visibility) and training crawlers (whether the model knows you tomorrow). •     A repeatable, four-step way to settle any bot you cannot verify on the first pass. •     Why Gemini is the one source you cannot measure by name, and how that rhymes with Google’s old “(not provided)” move. The numbers, at a glance: •     Live AI-assistant fetches: 33 claimed, 6 verified, 27 spoofed. An 81.8 percent spoof rate among the requests that could be checked. •     Googlebot: 799 claimed, 107 verified, 692 spoofed. Roughly 87 percent not Google. •     Most active verified crawlers: Anthropic’s ClaudeBot 166, Googlebot 107, OpenAI’s GPTBot 46, OpenAI’s search crawler 40. •     CCBot (Common Crawl): 20 claimed, 0 verified. Confirmed as impostors across four independent checks. A reminder these are two weeks on one small, new site. The method is the point, not my totals. The published IP-range lists (verify your own logs) These are the first-party files each operator publishes. A request is only legitimate if its source IP falls inside the matching list. Each link goes straight to the source. OpenAI ChatGPT-User (live user fetch): https://openai.com/chatgpt-user.json [https://openai.com/chatgpt-user.json] OAI-SearchBot (search / retrieval): https://openai.com/searchbot.json [https://openai.com/searchbot.json] GPTBot (training): https://openai.com/gptbot.json [https://openai.com/gptbot.json] Anthropic (one file covers all of their bots, including ClaudeBot and Claude-User) https://claude.com/crawling/bots.json [https://claude.com/crawling/bots.json] Perplexity Perplexity-User (live user fetch): https://www.perplexity.com/perplexity-user.json [https://www.perplexity.com/perplexity-user.json] PerplexityBot (crawler): https://www.perplexity.com/perplexitybot.json [https://www.perplexity.com/perplexitybot.json] Google (note: Google moved these to the /crawling/ipranges/ path in 2026, and the old URLs fail quietly) Common crawlers, including Googlebot: https://developers.google.com/static/crawling/ipranges/common-crawlers.json [https://developers.google.com/static/crawling/ipranges/common-crawlers.json] Special-case crawlers: https://developers.google.com/static/crawling/ipranges/special-crawlers.json [https://developers.google.com/static/crawling/ipranges/special-crawlers.json] User-triggered agents: https://developers.google.com/static/crawling/ipranges/user-triggered-agents.json [https://developers.google.com/static/crawling/ipranges/user-triggered-agents.json] Common Crawl CCBot: https://index.commoncrawl.org/ccbot.json [https://index.commoncrawl.org/ccbot.json] Verification and proof resources •     Google’s full crawler and fetcher reference, where Google states that Google-Extended has no separate user agent and is a robots.txt control token, not a fetcher, and where Google itself warns the user agent string can be spoofed: https://developers.google.com/crawling/docs/crawlers-fetchers/google-common-crawlers [https://developers.google.com/crawling/docs/crawlers-fetchers/google-common-crawlers] •     Google’s guide to verifying that a request really came from Google, including the reverse-DNS method: https://developers.google.com/crawling/docs/crawlers-fetchers/verify-google-requests [https://developers.google.com/crawling/docs/crawlers-fetchers/verify-google-requests] •     Common Crawl’s public index. Drop in a domain and a recent crawl to check whether your site is actually in the corpus. Use a wildcard, for example yoursite.com/*, so you are not just matching the homepage: https://index.commoncrawl.org/ [https://index.commoncrawl.org/] Run it yourself: the four-step chase When a bot will not verify on the first pass, do not stop at “unknown.” Do this: 1.   Check the published IP list. Is the source address inside the operator’s ranges? 2.   Check reverse DNS. Does the IP resolve back to the operator’s own hostname? 3.   Check the corpus or index where one exists, like Common Crawl’s, to see if you were actually captured. 4.   Run a WHOIS lookup on the raw IP to see who really owns it. Commodity hosting in random countries is your answer. Four angles that agree is proof. One that does not is a thread worth pulling. Try it and tell me. Run this on your own logs and send me two numbers: your demand spoof rate, and your Googlebot one. I suspect the real story is in the spread between them. More on the question of what happens to your content after the fetch: https://www.citationiq.com [https://www.citationiq.com] Follow the show so the next episode finds you. Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

21. juni 202619 min
episode Rank and AI Citation Aren’t the Same Number cover

Rank and AI Citation Aren’t the Same Number

Resources and references Query length, prompts vs. searches: SimilarWeb data on how much longer AI prompts run than Google queries: https://officechai.com/ai/chatgpt-queries-17x-longer-than-google-searches-6x-longer-than-googles-ai-mode-similarweb-data/ [https://officechai.com/ai/chatgpt-queries-17x-longer-than-google-searches-6x-longer-than-googles-ai-mode-similarweb-data/] Clickstream analysis on the gap between the typed prompt and the search the model actually fires: https://martech.org/chatgpt-growing-as-a-traffic-referrer-reshaping-search-behavior-report-says/ [https://martech.org/chatgpt-growing-as-a-traffic-referrer-reshaping-search-behavior-report-says/] Study on prompt decomposition, multiple retrieval searches per prompt: https://searchengineland.com/chatgpt-search-prompts-data-463407 [https://searchengineland.com/chatgpt-search-prompts-data-463407] Longtail and ranking: Why longtail is about specificity and search volume, not word count: https://www.yotpo.com/blog/long-tail-keywords-guide/ [https://www.yotpo.com/blog/long-tail-keywords-guide/] On long, specific phrases being easier to rank for at modest authority: https://www.w3era.com/blog/seo/long-tail-keyword-strategy/ [https://www.w3era.com/blog/seo/long-tail-keyword-strategy/] On reading search volume as a starting point, not a verdict: https://www.outrank.so/blog/how-to-find-low-competition-keywords [https://www.outrank.so/blog/how-to-find-low-competition-keywords] Citation vs. organic overlap: Moz, on most AI Mode citations not appearing in the organic results for the same query: https://thenextweb.com/news/ai-changing-seo-tools [https://thenextweb.com/news/ai-changing-seo-tools] ZipTie, on how few cited URLs land in Google's top ten: https://ziptie.dev/blog/how-different-ai-platforms-cite-the-same-source-differently/ [https://ziptie.dev/blog/how-different-ai-platforms-cite-the-same-source-differently/] Semrush AI Mode study, including heavy Perplexity-Google overlap: https://www.semrush.com/blog/ai-mode-comparison-study/ [https://www.semrush.com/blog/ai-mode-comparison-study/] How input shape moves what gets surfaced: Comparative analysis, AI sourcing shifting with the character of the query: https://arxiv.org/abs/2601.16858 [https://arxiv.org/abs/2601.16858] Study on outputs shifting when prompts are rephrased: https://arxiv.org/abs/2509.08919 [https://arxiv.org/abs/2509.08919] Book: The Machine Layer: https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1 [https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1] Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

14. juni 202616 min
episode AI Search Runs on Two Memory Systems. The Platforms Don’t Use Them the Same Way. cover

AI Search Runs on Two Memory Systems. The Platforms Don’t Use Them the Same Way.

Referenced in this episode: When the Training Data Cutoff Becomes a Ranking Factor (Duane Forrester Decodes) https://duaneforresterdecodes.substack.com/p/when-the-training-data-cutoff-becomes [https://duaneforresterdecodes.substack.com/p/when-the-training-data-cutoff-becomes] The companion piece this episode builds on, where I first laid out the parametric-versus-retrieval distinction and what it means for timing. How Perplexity finds and chooses its sources (Search Engine Journal) https://www.searchenginejournal.com/perplexity-ai-interview-explains-how-ai-search-works/565395/ [https://www.searchenginejournal.com/perplexity-ai-interview-explains-how-ai-search-works/565395/] Background on why Perplexity runs a live search on essentially every query rather than answering from memory. Google's AI optimization guidance, and why AI Search is still Search (DemandSphere) https://www.demandsphere.com/blog/google-ai-optimization-guide-ai-search-is-still-search/ [https://www.demandsphere.com/blog/google-ai-optimization-guide-ai-search-is-still-search/] Support for the point that AI Overviews and AI Mode are served off the core Search index, not from Gemini's parametric memory. Claude web search tool documentation (Anthropic) https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool [https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool] Primary source showing Claude's web search runs as a tool the model invokes only when it decides a question needs it. Manage public web access in Microsoft 365 Copilot (Microsoft Learn) https://learn.microsoft.com/en-us/microsoft-365/copilot/manage-public-web-access [https://learn.microsoft.com/en-us/microsoft-365/copilot/manage-public-web-access] The admin control behind the point that, on Copilot, whether retrieval happens at all can be a tenant policy setting. Stop Treating AI Visibility as One Problem (Duane Forrester Decodes) https://duaneforresterdecodes.substack.com/p/stop-treating-ai-visibility-as-one [https://duaneforresterdecodes.substack.com/p/stop-treating-ai-visibility-as-one] The earlier governed-visibility piece this episode zooms into, treating retrieval as one of three layers to manage. ChatGPT search behavior, clickstream insights (Semrush) https://www.semrush.com/blog/chatgpt-search-insights/ [https://www.semrush.com/blog/chatgpt-search-insights/] The study behind the stat that ChatGPT's share of search-triggering sessions swung between roughly 15 and 66 percent as models updated. Lost in the Middle: How Language Models Use Long Contexts (arXiv) https://arxiv.org/abs/2307.03172 [https://arxiv.org/abs/2307.03172] The foundational research on models using long context unevenly, behind the point that being retrieved isn't the same as being used well. How up to date is ChatGPT, and how knowledge cutoffs work (JustDone) https://justdone.com/blog/ai/how-up-to-date-is-chatgpt [https://justdone.com/blog/ai/how-up-to-date-is-chatgpt] Context for the training-cadence point that providers now ship frequent point releases, each carrying its own cutoff. The Machine Layer (Amazon) https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1 [https://www.amazon.com/Machine-Layer-Visible-Trusted-Search/dp/B0G2WZKM59/ref=sr_1_1] My book, for the longer argument on why visibility, trust, and machine-readability are converging into one problem. Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

7. juni 202614 min
episode You Can Finally Measure Content Alignment. That’s the Dangerous Part. cover

You Can Finally Measure Content Alignment. That’s the Dangerous Part.

References from this episode: When I mentioned Gerard Salton's SMART system at Cornell, the foundational vector space model work from the 1960s, here's the background on that: https://en.wikipedia.org/wiki/SMART_Information_Retrieval_System [https://en.wikipedia.org/wiki/SMART_Information_Retrieval_System] The Netflix study I referenced on cosine similarity producing arbitrary results in embedding models, that's the 2024 paper from Steck, Ekanadham, and Kallus: https://research.netflix.com/publication/is-cosine-similarity-of-embeddings-really-about-similarity [https://research.netflix.com/publication/is-cosine-similarity-of-embeddings-really-about-similarity] The MTEB benchmark leaderboard, where you can see the performance spread across current embedding models: https://huggingface.co/spaces/mteb/leaderboard [https://huggingface.co/spaces/mteb/leaderboard] Goodhart's Law, the "when a measure becomes a target it ceases to be a good measure" concept: https://en.wikipedia.org/wiki/Goodhart%27s_law [https://en.wikipedia.org/wiki/Goodhart%27s_law] The vector index hygiene piece I referenced from last year: https://duaneforresterdecodes.substack.com/p/vector-index-hygiene-a-new-layer [https://duaneforresterdecodes.substack.com/p/vector-index-hygiene-a-new-layer] The written version of this episode is available as the full article on this same Substack. Get full access to Duane Forrester Decodes at duaneforresterdecodes.substack.com/subscribe [https://duaneforresterdecodes.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

31. maj 202619 min