Allowing AI crawlers in robots.txt: the 2026 list

AI Search

In 2024, "AI crawler" basically meant GPTBot. In 2026, there are at least seven user-agents you should know about, each with a different purpose. Some train models, some index for live AI search answers, and some do both. Treating them as a single category is a mistake.

The crawlers that actually matter

  • OAI-SearchBot — OpenAI's search crawler. Used by ChatGPT's live search feature. Block this and you disappear from ChatGPT answers in real time.
  • ClaudeBot — Anthropic's crawler. Used to ground Claude's search answers and (separately) for training under Anthropic's policies.
  • PerplexityBot — Perplexity's crawler. Powers their real-time citation engine.
  • Google-Extended — Google's opt-out token for Gemini training and AI Overviews. Blocking this does NOT remove you from regular Google search (still controlled by Googlebot).
  • CCBot — Common Crawl. Used by everyone (researchers, model builders). Block this and you disappear from the most widely-used training dataset on Earth.
  • anthropic-ai — Older Anthropic user-agent, still seen in the wild. Keep allowing for backwards compat.

The recommended default for 2026

For most sites — content publishers, SaaS companies, indie hackers, portfolios — we recommend allowing all the above. The upside (showing up in AI answers) significantly outweighs the downside (your content training a future model that may compete with you in some abstract way).

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: anthropic-ai
Allow: /

If you are a paid-content publisher (newspaper, premium course platform), the calculation is the opposite — block the training-only bots (GPTBot, CCBot, Google-Extended) but keep the live-search bots (OAI-SearchBot, ClaudeBot, PerplexityBot) allowed so you still get citations.

How we audit it

Our auditor fetches your robots.txt and checks which AI user-agents are allowed vs. blocked, flagging any combination that creates accidentally weird outcomes — like blocking OAI-SearchBot while allowing GPTBot (you train their model but they cannot cite you). Run a free audit and see your current state in 5 seconds.

Run a free audit on your site

See how your site scores across 40+ SEO, JSON-LD, and GEO/AI-search checks — including everything covered in this guide. Free forever, no signup, no crawl cap.

Audit my site →

Keep reading