AI crawler is the umbrella term for the web bots operated by AI engines to index content for citation, search, and (sometimes) training. As of 2026 there are roughly 30 active AI crawlers across major engines: GPTBot, ChatGPT-User, OAI-SearchBot (OpenAI); ClaudeBot, anthropic-ai, ClaudeUser (Anthropic); PerplexityBot, Perplexity-User; Google-Extended; Applebot-Extended; Bytespider, Amazonbot, CCBot, cohere-ai, Meta-ExternalAgent, and others.
Allowing AI crawlers in robots.txt is the foundational AIO move. Many sites default to blocking them. The 2026 equivalent of blocking Googlebot in 2005 was once a defensible IP-protection move; in 2026 it makes you invisible to a meaningful share of buyer queries.
Each engine maintains separate crawler policies and IP ranges. Verify access via server logs (look for the User-Agent strings) or use crawler-simulation tools.
Read next
Frequently asked
Which AI crawlers should I allow?
The major ones in 2026: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Bytespider, CCBot, and Amazonbot. Most sites should default to allowing all, then disallow per use case.
How do I block AI crawlers if I want to?
Add User-agent: <crawler-name> followed by Disallow: / in robots.txt. Note that this also blocks citation inside AI answer engines, not just training. Most teams find the citation loss outweighs the training opt-out value.
Do AI crawlers respect robots.txt?
The major ones (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended) publicly commit to respecting robots.txt. Less-established crawlers vary; check published policies before assuming compliance.
More from AIO and AEO
All termsAI Visibility
How often and how accurately a brand appears inside AI engine answers (ChatGPT, Claude, Perplexity).
ReadGPTBot
OpenAI's web crawler that indexes content for ChatGPT and the OpenAI training corpus.
ReadClaudeBot
Anthropic's web crawler that indexes content for Claude's search and citation features.
Read