How to get cited by Claude, the 2026 playbook

May 25, 2026 ·OnyxRank Team

Claude is the outlier in the AI search landscape. Anthropic has not shipped a public, live-retrieval search index for Claude in the way OpenAI shipped one for ChatGPT or Google shipped AI Mode. As of May 2026, Claude's web search tool exists (anthropic.com/news/web-search) but is invoked selectively rather than by default. The implication: getting cited by Claude is mostly about Common Crawl training data inclusion, with a secondary lever around the search tool when it does run.

This guide explains the three pathways, the bot agents, and the 8 steps that move a brand into Claude's citation set.

How Claude picks citations

Three pathways, in rough order of importance.

**Pathway 1, training data citation.** Claude is trained on a snapshot of the open web (Common Crawl is the largest single source). When a user asks a factual question, Claude can cite a domain from its training data if the domain was sufficiently prominent in the training corpus. This is the dominant pathway for citations in conversations where the search tool is not invoked.

**Pathway 2, search tool retrieval.** When Claude (Opus and Sonnet variants on the Anthropic API and Claude.ai) decides to invoke the web search tool, it retrieves live results and incorporates them. The retrieval backend is Brave Search according to Anthropic's public documentation. Brave rankings therefore matter for live citations.

**Pathway 3, ClaudeBot indexing.** Anthropic operates ClaudeBot, anthropic-ai, and Claude-Web crawlers that collect content for future model training. Allowlisting these agents is the strategic move for medium-term citation share (next training cycle, typically 6 to 12 months out).

The asymmetry: Claude citations compound on a slower timescale than ChatGPT or Perplexity. A blog post that gets indexed today might not surface as a Claude citation until Anthropic's next training run. Patience is part of the strategy.

Claude's bot user agents

Bot	Purpose	Should you allow?
ClaudeBot	Crawls for Anthropic training data	Yes if you want training citation
anthropic-ai	Legacy training crawler	Yes
Claude-Web	Live retrieval and content fetching during conversations	Yes for search-tool citation
CCBot	Common Crawl, the largest training-data source for Claude	Yes, critical

Allowlist in robots.txt:

```

User-agent: ClaudeBot

Allow: /

User-agent: anthropic-ai

Allow: /

User-agent: Claude-Web

Allow: /

User-agent: CCBot

Allow: /

```

CCBot is the most under-rated entry on the list. Common Crawl is the foundation training corpus for Claude, GPT-4-class models, Gemini, and Llama. Blocking CCBot means opting out of the next generation of every model. Most brands inherited a CCBot block from 2023 boilerplate without understanding the consequence.

Anthropic documents its bots at docs.anthropic.com/docs/build-with-claude/crawler. Confirm allowlist with `curl -A "ClaudeBot" https://yourdomain.com`.

What Claude cites most

OnyxRank ran 3,500 Claude scans in April and May 2026 (using Claude-Opus-4-7 via the API with the web search tool enabled). Citation distribution:

Rank	Type	Share of citations
1	Long-form blog posts on established domains	19%
2	Wikipedia	15%
3	Industry publications	13%
4	Academic and research sources	10%
5	Government .gov sources	9%
6	Brand owned long-form	8%
7	Reddit threads	8%
8	YouTube videos (transcript-cited)	5%
Other	News, niche forums, technical docs	13%

Claude's citation distribution is the most "old-internet" of any AI engine. Wikipedia, .gov, and academic sources collectively account for 34% of citations, far higher than ChatGPT (12%) or Perplexity (5%). Claude was trained with explicit reward signals for citing high-authority sources, and the citation behavior reflects that.

The other notable: Reddit at 8% is half the ChatGPT or Perplexity rate. UGC is less of a citation driver on Claude than on competing engines.

The 8-step Claude action plan

1. **Audit robots.txt for CCBot, ClaudeBot, anthropic-ai, Claude-Web.** Confirm all four are explicitly allowed. CCBot is the most commonly blocked.

2. **Ship a long-form, high-authority content layer.** Claude favors 2,500+ word posts with citations to primary sources. Short blog posts under 1,500 words are systematically under-cited.

3. **Cite primary sources in your own content.** Pages that themselves cite Wikipedia, .gov, .edu, and named research sources gain transitive authority. Claude's training reward signal generalized to "sources that cite high-authority sources are themselves authoritative."

4. **Earn Wikipedia mentions where possible.** A single Wikipedia article that references your brand or methodology will produce Claude citations for years. The bar is high (verifiability via secondary sources required), but the long-tail return is enormous.

5. **Submit a /llms.txt and a /llms-full.txt.** Anthropic's documentation references these files as a discovery signal. A comprehensive /llms-full.txt with full blog corpus inlined is the highest-leverage single asset for Claude citation.

6. **Publish original research and methodology.** Claude's citation reward strongly favors original-source content over commentary. A quarterly benchmark study, an original survey, or a published methodology page outranks 10 derivative blog posts for citation purposes.

7. **Build named-author entity authority.** Claude weights author E-E-A-T more heavily than other engines. Pages with named authors, author schema, and link-graph evidence that the author is a domain expert (mentioned across multiple authoritative sources) get cited at 2 to 3x the rate of anonymous content.

8. **Accept the slower timescale.** Claude citation share moves on a 6 to 12 month timescale, not a 30-day timescale. Pulse-track quarterly, not weekly. Use https://onyxrank.com/tools/citation-checker for a baseline.

Common mistakes

**Blocking CCBot to prevent training.** This is the most common and most expensive robots.txt misconfiguration we find. CCBot blocking means opting out of every major model's training data, not just Claude's.

**Expecting weekly citation movement on Claude.** Claude's citation share lags published content by months because of the training-cycle delay. Use Claude as the long-term compounding engine, not the short-term measurement engine.

**Treating Claude search tool as primary.** It is not. The search tool is invoked less than 30% of the time on factual queries, per Anthropic's published behavior model. The dominant citation pathway is training data.

**Underinvesting in long-form.** A 1,200-word post is statistically not cited by Claude. The format does not extract well into authoritative answer spans. 2,500+ words is the threshold where citation rates climb meaningfully.

How to verify you are working

Three checks.

1. **Quarterly Pulse scan against 20 prompts.** Run weekly Pulse is wasteful for Claude; quarterly is the correct cadence.

2. **CCBot, ClaudeBot, anthropic-ai access logs.** Verify all three agents fetch pages routinely. If you see no fetches from these agents in 30 days, your robots.txt or CDN is blocking them.

3. **Wikipedia mention count.** Track via a Wikipedia backlink monitor (Wikipedia API or third-party tools). Even one mention is a leading indicator.

What to ship next

The highest leverage move for Claude citation is publishing one defensible methodology page (the "how we measure X" page that becomes the source of truth in your category) and earning one Wikipedia mention that references it. Both compound for years.

Free audit at https://onyxrank.com/tools/citation-checker. Full strategy at https://onyxrank.com/blog/ai-citation-formula-geo-optimization-2026.