How to Block AI Crawlers (and When You Shouldn't) in 2026

AI Search
TL;DR

To block AI crawlers, add User-agent rules in robots.txt disallowing GPTBot, ClaudeBot, CCBot, PerplexityBot, and Google-Extended. But block selectively: training crawlers like GPTBot take your content, while live-search bots like OAI-SearchBot can send you citation traffic.

How to block AI crawlers in robots.txt

The short answer to how to block AI crawlers is that you add User-agent and Disallow rules to the robots.txt file at the root of your domain, naming each bot you want to keep out. The fastest way to block the major ones is a single robots.txt block that names GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), PerplexityBot, and Google-Extended (Google's AI training opt-out token). Place the file at https://yourdomain.com/robots.txt, and reputable AI companies will honor the directives within a day or two of their next crawl.

Here is a copy-paste robots.txt that blocks the main training and scraping bots while leaving normal search engines untouched:

# Block AI training + scraping crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

# Everyone else (Googlebot, Bingbot, humans) allowed
User-agent: *
Disallow:

One critical caveat before you ship that file: robots.txt is a request, not a wall. It works because well-behaved companies choose to obey it. It does not stop bots that ignore the standard, and it does not block scrapers using fake or rotating user-agent strings. If you need enforcement, you pair robots.txt with firewall or WAF rules. For the protocol fundamentals, see what is robots.txt.

The honest trade-off: training bots vs. live-search bots

The hard part of blocking AI crawlers is that "AI crawler" is two very different jobs wearing one label. Some bots scrape your content to train a model — you get nothing back. Other bots fetch your page in real time to answer a live user question, and they cite you with a clickable link. Block the first kind and you lose nothing. Block the second kind and you cut off a growing source of referral traffic.

The clearest example is OpenAI's split fleet. GPTBot crawls the open web to gather training data for future models. OAI-SearchBot fetches pages live to power ChatGPT's search and inline citations. They are separate user-agents on purpose — so you can block training while keeping search. The same pattern shows up across vendors: training crawlers take, live-search crawlers cite.

If a bot puts a link back to your site in front of a user, blocking it is usually a mistake. If a bot only feeds a training corpus you'll never be credited in, blocking it costs you nothing.

So the strategy most sites land on is not "block everything" or "allow everything" — it's a deliberate allowlist. Block the pure-training and bulk-scraping bots; allow the live-search and answer bots that can drive clicks. We walk through the exact allow/block split — and how to earn citations from the bots you keep — in how to get cited by Claude.

The AI crawler cheat sheet: who's who

Below is the lookup table for the crawlers worth caring about in 2026, what each one does, and the default recommendation. "Block" means it mostly takes for training with no citation upside. "Allow" means it can send you traffic. "Your call" means it depends on whether you want your content used for AI features at all.

Major AI crawlers in 2026: what each does and whether to block it
Bot (User-agent)OwnerWhat it doesDefault call
GPTBotOpenAICrawls the web to train future GPT modelsBlock (training only)
OAI-SearchBotOpenAIFetches pages live for ChatGPT search + citationsAllow (sends traffic)
ClaudeBotAnthropicCrawls for training and live answer fetchesYour call
PerplexityBotPerplexityLive retrieval for cited Perplexity answersAllow (sends traffic)
Google-ExtendedGoogleOpt-out token for Gemini training + AI featuresYour call (no SEO impact)
CCBotCommon CrawlBulk archive many AI labs train on indirectlyBlock (training only)
BytespiderByteDanceAggressive scraping for trainingBlock

A few notes the table can't hold: Google-Extended is not a crawler — it's a token Googlebot reads to decide whether your content trains Gemini and feeds AI features. Blocking it does NOT remove you from Google Search or AI Overviews; it only opts you out of model training. CCBot belongs to Common Crawl, a nonprofit archive that many AI labs train on indirectly, so blocking it quietly cuts off several downstream models at once.

Step-by-step: deciding what to block

Don't copy a random robots.txt from a forum. The right answer depends on whether you sell content, want AI citations, or just want to stop bandwidth-eating scrapers. Walk the decision below before you write a single Disallow line.

Should you block this AI crawler?
  1. Identify the bot's jobCheck whether the user-agent crawls for model training or fetches pages live to answer user queries.
  2. Does it cite you?If the bot links back to your site in answers (live-search bots), blocking it usually costs you traffic.
  3. Is your content your product?Paywalled, proprietary, or premium content is the strongest case for blocking training bots broadly.
  4. Write targeted robots.txt rulesAdd a User-agent + Disallow block per bot you decided to block; leave search engines under User-agent: *.
  5. Validate and monitorConfirm Googlebot and your sitemap aren't blocked, then run an audit to verify the rules parse correctly.

After you publish your rules, verify they actually parse and that you haven't accidentally blocked Googlebot or your own sitemap. A typo like a stray Disallow: / under User-agent: * can deindex your whole site. Run your domain through our free SEO + GEO audit to confirm your robots.txt is valid and to see which AI bots you're currently allowing or blocking — the AI bots blocked check flags this specifically.

If you decide you want AI engines to find AND cite you (the opposite of blocking), the next lever is answer-first content built for AI engines — covered in how to do AI search optimization and what is generative engine optimization.

Does blocking AI crawlers hurt your SEO?

Blocking AI crawlers does not hurt traditional SEO, because the user-agents you block for AI (GPTBot, ClaudeBot, CCBot, PerplexityBot, Google-Extended) are separate from the search crawlers that index you (Googlebot, Bingbot). You can block every training bot and keep your full Google ranking — they read different rules. The robots.txt above is explicitly built so Googlebot and Bingbot fall under the permissive User-agent: * line.

Where it gets nuanced is AI search visibility, which is a newer, separate channel from blue-link SEO. Blocking the live-search bots — OAI-SearchBot, PerplexityBot, ClaudeBot's user-facing fetches — means you won't appear as a cited source in ChatGPT, Perplexity, or Claude answers. That doesn't dent your Google rank, but in 2026 a meaningful slice of discovery happens inside AI answers, and a blocked site simply isn't eligible to be cited.

So frame the decision honestly: blocking AI crawlers protects your content from uncredited training use at the cost of AI-answer visibility. If your business depends on being the source AI tools quote, block training bots only. If your content is your product (paywalled news, courses, proprietary research), blocking broadly is the rational defensive move. There's no universally correct answer — only the right answer for your model.

Run a free audit on your site

See how your site scores across 40+ SEO, JSON-LD, and GEO/AI-search checks — including everything covered in this guide. Free forever, no signup, no crawl cap.

Audit my site →

People also ask

Should I block AI crawlers?

Whether you should block AI crawlers depends on your goal. Block pure-training bots like GPTBot and CCBot if you don't want your content used to train models for free, since they give nothing back. But avoid blocking live-search bots like OAI-SearchBot and PerplexityBot, which cite you with clickable links and can drive referral traffic. Most sites benefit from a selective allowlist rather than blocking everything.

How do I block GPTBot?

To block GPTBot, add the lines `User-agent: GPTBot` and `Disallow: /` to the robots.txt file at the root of your domain. OpenAI honors this directive within a day or two of its next crawl. Note that blocking GPTBot stops OpenAI's training crawler only — it does not block OAI-SearchBot, the separate user-agent that powers ChatGPT search citations, which you must disallow by name if you also want to block live search.

Does blocking AI crawlers hurt SEO?

Blocking AI crawlers does not hurt traditional Google or Bing SEO, because AI bots like GPTBot and Google-Extended use different user-agents than Googlebot and Bingbot. Your search rankings stay intact. The one trade-off is AI search visibility: blocking live-search bots removes you from being cited in ChatGPT, Perplexity, and Claude answers, which is a separate channel from blue-link rankings.

What is the difference between GPTBot and OAI-SearchBot?

GPTBot and OAI-SearchBot are both OpenAI crawlers but do opposite jobs. GPTBot scrapes the web to gather training data for future models and credits no one. OAI-SearchBot fetches pages in real time to answer ChatGPT search queries and cites your site with a clickable link. OpenAI keeps them as separate user-agents so you can block training while still allowing search citations.

Will robots.txt actually stop AI crawlers?

Robots.txt stops AI crawlers only if the company behind the bot chooses to obey it, which the major players like OpenAI, Anthropic, and Perplexity do. It does not physically block anything, so bad-faith scrapers using fake user-agents can ignore it entirely. For real enforcement, pair robots.txt directives with firewall or WAF rules that block by IP or behavior.

Frequently asked questions

Does blocking Google-Extended remove me from Google Search?

No. Google-Extended is a token that only controls whether your content trains Gemini and feeds Google's AI features. Blocking it leaves your normal Google Search and AI Overviews presence fully intact, because Googlebot reads a different set of rules. It is purely a training opt-out, not a search opt-out.

How do I block all AI crawlers at once?

There is no single wildcard for AI crawlers, so you must name each user-agent in robots.txt with its own Disallow rule — GPTBot, ClaudeBot, CCBot, PerplexityBot, Google-Extended, and others. A `User-agent: *` block would also catch Googlebot and Bingbot, which would damage your search rankings. List the AI bots explicitly and keep the wildcard rule permissive.

Can I block AI training but still get cited in AI search?

Yes, and that is the recommended setup for most sites. Block the training-only bots like GPTBot and CCBot while allowing the live-search bots like OAI-SearchBot and PerplexityBot that fetch pages to answer queries and cite you. This protects your content from uncredited training use while keeping you eligible for AI-answer citations.

Keep reading

People also search for