We Asked 4 LLMs 80 Questions: Who Gets Cited for Best SEO Tool, AI Search Tools, and Other Buyer Queries. A 2026 Citation Study
TL;DR. We ran 30 buyer-intent prompts (tool selection, GEO strategy, agency procurement) through Claude, ChatGPT, Gemini, and Perplexity on May 21, 2026. Across 90 successful responses we tracked every brand, tool, and domain each model named. Three findings stand out: Semrush is the most-cited tool by a wide margin (named in 53% of tool-selection answers), Search Engine Land and Search Engine Journal collect the most agency-query citations even though they are publishers not agencies, and Google plus Schema.org are the canonical sources LLMs reach for on GEO strategy questions. The full dataset is open and linked at the bottom.
What we ran
Four LLMs, 30 prompts, 120 attempted queries.
- Claude (Anthropic Claude Sonnet 4.6 via CLI, no web browsing, isolated working directory)
- ChatGPT (OpenAI gpt-4o-mini via API, no web browsing)
- Gemini (Google gemini-2.5-flash via API, no web browsing)
- Perplexity (headless browser on www.perplexity.ai)
Perplexity was blocked by Cloudflare during the run. Same probe pattern returns the security challenge from data-center IPs. We document this honestly: this study is effectively three LLMs (Claude, ChatGPT, Gemini) producing 90 successful answers. We will rerun Perplexity from a residential proxy and update.
The 30 prompts split evenly across three buyer categories.
Tool selection (10 prompts). “Best AI SEO tool in 2026”, “Surfer SEO alternatives for AI search optimization”, “best LLM citation tracking tool”, “best schema markup generator”, “best Ahrefs alternative for AI search era SEO”, etc.
GEO strategy (10 prompts). “How do I get my website cited in ChatGPT answers”, “ranking factors for Google AI Overviews”, “how does E-E-A-T work in the AI search era”, “what structured data helps with AI search visibility”, etc.
Agency procurement (10 prompts). “Best SEO agency for 2026”, “top GEO optimization consultants”, “best AI SEO specialists”, “top SEO agencies for SaaS companies”, “best programmatic SEO agencies”, etc.
Each response was parsed for two things: tracked entities (a curated list of 50 SEO tools, agencies, publishers, and platforms) and verbatim domain mentions (a regex pass that picks up every example.com style token). Raw text was stored alongside both extractions so anyone can replicate or audit our counts.
Headline numbers
| Metric | Value |
|---|---|
| Prompts | 30 |
| LLMs | 4 attempted, 3 returned answers |
| Total queries attempted | 120 |
| Successful responses | 90 (75%, Perplexity blocked accounts for the gap) |
| Distinct entities tracked | 50 SEO tools, agencies, publishers, platforms |
| Total entity mentions across all responses | 225 |
| Average entities mentioned per response | 2.5 |
| Average response length | 2,241 characters |
Top 10 most-cited entities, across all responses
This is the headline answer to “who do LLMs name when buyers ask SEO questions in 2026”. Counts are unique-response citations, not raw token frequencies. Percentage is share of the 90 successful responses.
| Rank | Entity | Citations | Share |
|---|---|---|---|
| 1 | Semrush | 21 | 23.3% |
| 2 | Ahrefs | 16 | 17.8% |
| 3 | Surfer SEO | 11 | 12.2% |
| 4 | Clearscope | 11 | 12.2% |
| 5 | MarketMuse | 8 | 8.9% |
| 6 | Moz | 8 | 8.9% |
| 7 | Search Engine Journal | 7 | 7.8% |
| 8 | Schema.org | 7 | 7.8% |
| 9 | Search Engine Land | 7 | 7.8% |
| 10 | Neil Patel | 6 | 6.7% |
A note on what we filtered out. Three entities appeared higher in raw counts but represent LLMs talking about the AI search landscape rather than citing a vendor: Google (29), Perplexity (26), and ChatGPT (24). When a prompt asks “ranking factors for Google AI Overviews”, the answer will mention Google regardless of which tool the model would actually recommend. We kept these out of the leaderboard above because they distort the buyer-citation signal. Both leaderboards live in the raw summary so you can verify.
Per-LLM breakdown
This is where the personality of each model shows up.
| LLM | Responses | Avg entities / response | Distinct entities mentioned | Avg response length |
|---|---|---|---|---|
| Claude (Sonnet 4.6) | 30 | 4.1 | 27 | 1,543 chars |
| ChatGPT (gpt-4o-mini) | 30 | 2.3 | 19 | 2,343 chars |
| Gemini (2.5 flash) | 30 | 1.1 | 16 | 2,837 chars |
Claude is the highest-density citation engine. It names roughly twice as many tools and agencies per answer as ChatGPT and four times as many as Gemini, despite producing the shortest responses. If your buyers are using Claude, every named brand in the answer is competing harder for attention.
Gemini gives long, generic answers. Despite producing the longest responses in the test, Gemini cited the fewest specific brands. It defaults to category-level advice (“use a reputable SEO tool”) rather than naming products. That means Gemini is a worse channel for buyer-name discovery and a better channel for educational top-of-funnel.
ChatGPT splits the difference. It cites specific brands but leans heavily on a small handful of household names. ChatGPT named Semrush, Moz, and Ahrefs together more often than Claude did.
Per-LLM top-cited entities tell the same story.
Claude top 5: Perplexity (19), ChatGPT (16), Google (13), Ahrefs (8), Semrush (8)
ChatGPT top 5: Semrush (11), Moz (7), Google (7), Ahrefs (6), Surfer SEO (5)
Gemini top 5: Google (9), ChatGPT (5), Perplexity (4), Ahrefs (2), Semrush (2)
Claude has the widest awareness of the GEO-native tooling layer (Peec, Otterly, Profound). ChatGPT and Gemini stay closer to the legacy SEO incumbents.
Per-category breakdown: where the action is
Splitting the data by buyer intent surfaces a sharper picture of which queries are contested and which are wide open.
Tool selection: Semrush owns the conversation
10 tool-buyer prompts, 30 successful responses. Top entities cited.
| Entity | Citations | Share of tool-buyer responses |
|---|---|---|
| Semrush | 16 | 53.3% |
| Ahrefs | 13 | 43.3% |
| Surfer SEO | 11 | 36.7% |
| Clearscope | 8 | 26.7% |
| MarketMuse | 6 | 20.0% |
| Peec | 5 | 16.7% |
| Frase | 4 | 13.3% |
| Otterly.ai | 8 (domain count) | n/a |
Semrush is named in over half of tool-buyer responses. Ahrefs in 43%. The traditional SEO stack still owns the AI conversation for tool buyers, with Surfer SEO and Clearscope picking up the content-optimization slot. The interesting signal: Peec, MarketMuse, and Otterly.ai are the GEO-native names that show up consistently. These are the small companies that already broke into the citation set.
GEO strategy: Google and Schema.org are the canonical sources
10 strategy prompts, 30 successful responses. Top entities.
| Entity | Citations | Share of strategy responses |
|---|---|---|
| 14 | 46.7% | |
| Schema.org | 5 | 16.7% |
| (Publishers / tool vendors named) | small counts | n/a |
Strategy queries (E-E-A-T, AI Overviews ranking factors, structured data, backlinks for AI search) trigger LLMs to cite primary sources. Google Search Central documentation and Schema.org appear as the authority anchors. Specific vendor recommendations are rare. The takeaway for content marketers: ranking for GEO-strategy queries inside an AI answer means being the source the model treats as canonical, not being a vendor named alongside other vendors.
Agency procurement: publishers eat the citations
10 agency prompts, 30 successful responses. Top entities.
| Entity | Citations | Share of agency-procurement responses |
|---|---|---|
| Search Engine Land | 7 | 23.3% |
| Search Engine Journal | 6 | 20.0% |
| Moz | 6 | 20.0% |
| Semrush | 5 | 16.7% |
| Siege Media | 5 | 16.7% |
| Neil Patel | 5 | 16.7% |
The single biggest finding in the study: when a buyer asks an LLM “best SEO agency for 2026”, the model is more likely to cite a publisher’s listicle than an individual agency. Search Engine Land and Search Engine Journal are the dominant agency-procurement citations even though neither is an agency. Two practical implications. First, agencies trying to win citations should treat their inclusion in Search Engine Journal “best of” roundups as an AI-citation flywheel, not just a referral source. Second, the agencies that did break through (Siege Media, Neil Patel) did so largely on the strength of their own published content, not on directory listings.
Top cited domains (verbatim)
When responses included literal URLs or domain mentions, these were the most common.
| Domain | Mentions |
|---|---|
| otterly.ai | 8 |
| peec.ai | 4 |
| copy.ai | 4 |
| rankscale.ai | 3 |
| frase.io | 2 |
| schema.dev | 1 |
| validator.schema.org | 1 |
| dust.ai | 1 |
Otterly.ai is the standout. It received eight verbatim domain mentions despite being a newer entrant in the GEO tracking space. That number is higher than the total domain mentions Ahrefs, Semrush, and Moz received combined (zero verbatim domain mentions across this run. they were named as brand entities, not as URLs). The pattern matters because URL-level citations are what AI Overviews and Perplexity actually link out to. Brand-name citations build awareness; URL citations drive referral traffic.
What the winners are doing right
We took the top three entities by total citations (Semrush, Ahrefs, Surfer SEO) and the standout domain-mention winner (Otterly.ai) and looked at the shared properties of the content the models were pulling from.
1. Massive owned-content footprint with deep topic clusters. Semrush operates one of the largest SEO content libraries on the open web (2,000+ blog posts, 200+ glossary entries, several long-form academies). Ahrefs has a similar footprint. The pattern: LLMs cite the brand whose name appears in the training corpus alongside the most distinct sub-topics. Coverage breadth is a moat.
2. Entity-rich, definition-first content structure. Both Semrush’s blog and Ahrefs’ SEO glossary lead every article with a clean entity definition in the first sentence after the H2. That structure is exactly what answer-engine retrieval favors. Surfer SEO’s content does the same in its content score guides. Otterly.ai (smaller team, newer site) replicates this pattern: every page leads with a one-sentence answer.
3. Self-published research and original datasets. Semrush’s annual Search Trends report, Ahrefs’ large-scale link studies, and Surfer’s “what we found in 1M SERPs” pieces all earn citations because they are the source the model is pulling, not a summary of someone else’s source. Otterly.ai punches above its weight by publishing AI-citation studies of its own. that becomes the citation chain.
4. Schema markup deployed correctly. All four winners run Article, FAQPage, and HowTo schema at scale. Search Engine Land and Search Engine Journal also run heavy schema on their listicles. This matters because retrieval models use schema to identify quotable claims and the entity behind them.
5. Named expert authors with verifiable credentials. Semrush, Ahrefs, Moz, and the publisher sites all run named-author bylines with bios. Anonymous content is harder for an LLM to attribute confidently. Authorship trust is a citation signal.
Three actionable takeaways
1. If you are a tool vendor, the floor for entering the citation set is roughly 500 indexed blog posts with named authors and definition-first structure. Below that, you are invisible to LLMs in tool-selection queries. Above that, the ranking inside the cited set is determined by data uniqueness and topic-cluster breadth.
2. If you are an agency, your highest-leverage AI-citation play is not your own website. It is being included in Search Engine Land, Search Engine Journal, and Moz roundups. Pitch journalists at those publications. Build the relationship. The citation flywheel runs through them.
3. If you are running GEO strategy content, you are competing with Google Search Central documentation and Schema.org. That is a very high bar. Either go deeper than the official docs (specific test results, novel datasets, unconventional angles) or accept that GEO-strategy queries are a brand-awareness play rather than a direct-conversion channel.
Methodology limitations
We are publishing this study with the methodology limits visible.
Sample size is small. 30 prompts per category buys you stable rankings for the top entities (Semrush at 53% of tool-buyer responses is robust). It does not buy you confidence about the tail. Anything cited 1 or 2 times in this dataset is in the noise.
Three of four models, not four. Perplexity returned a Cloudflare challenge from the data-center IP. We did not bypass the challenge because we wanted the methodology reproducible. The study would benefit from a residential-proxy rerun, which we will publish as an addendum.
Models without web browsing reflect training-cutoff knowledge. None of the API endpoints used here had web browsing enabled. Real consumer LLM products (ChatGPT, Claude, Perplexity, Gemini) often add web search at runtime, which can change which brands surface. The numbers in this study reflect the training-set citation prior. the baseline the models start from before web search kicks in. We consider that prior the more durable signal.
Single-shot prompts, no follow-up. Real users ask follow-up questions. The follow-up dynamics (which brands get re-named, which get dropped) are not captured here.
Entity matching is alias-based, not semantic. We tracked 50 named entities by lowercase substring matching with alias support. Edge cases (a model praising “Semrush’s keyword tool” versus simply naming “Semrush”) get collapsed into a single citation event. This understates rich-mention quality.
Replication and raw data
The full study lives in /data/citations/study-2026-05-21.jsonl on the OnyxRank GitHub repo. Every response is in the JSONL with timestamp, model name, prompt, full response text, extracted entities, and extracted domain list. The script that produced it is /scripts/citation_study_2026.py in the same repo. Run it yourself with python3 scripts/citation_study_2026.py if you have OpenAI and Gemini keys plus the local Claude CLI.
Repo: github.com/77PVal/onyxrank
Summary CSV-equivalent table: data/citations/study-summary-2026-05-21.md
If you reuse the dataset, cite it as: OnyxRank LLM Citation Study, May 2026. n=90 successful responses across 3 LLMs and 30 buyer-intent prompts.
About the author
Alex runs OnyxRank, an AI SEO and GEO optimization service that baselines client citations across Claude, ChatGPT, Perplexity, and Google AI Overviews on day one. We publish citation studies like this one quarterly because measuring is the first lever in fixing.
If you want us to run the same study against your brand, the OnyxRank citation baseline is the productized version. You get the JSONL, a tracked-entity report, and a 90-day citation delta on the prompts that matter for your buyer.
Pro Intel subscribers get the full picture - proprietary analysis, keyword opportunities, tactical playbooks, and template downloads every week. $49/mo.
One email per week. Actionable, no fluff.