Robots.txt Generator — control crawlers, AI bots, indexing
Pick a preset (allow all, block AI bots, standard) or build per-bot rules manually. The robots.txt content is generated live, ready to paste at your domain root.
Pick a preset (allow all, block AI bots, standard) or build per-bot rules manually. The robots.txt content is generated live, ready to paste at your domain root.
/admin//?User-agent: * Disallow: /admin/ Disallow: /? Sitemap: https://example.com/sitemap.xml
robots.txt is a plain-text file at the root of your domain (https://example.com/robots.txt) that asks crawlers what they may and may not fetch. Honest crawlers — Googlebot, Bingbot, well-behaved researchers — read it before indexing. The Allow/Disallow rules are advisory: they don't prevent malicious bots, but they do prevent legitimate ones from indexing the wrong things.
Since 2023, robots.txt has become the standard mechanism for opting out of AI training crawlers (GPTBot, ClaudeBot, CCBot, PerplexityBot…). Each company publishes the user-agent string they crawl with; you simply add a User-agent + Disallow: / pair per company. This generator includes presets for the major AI bots and the standard SEO defaults.
No. It's a polite request that well-behaved crawlers follow. Malicious or aggressive scrapers ignore it entirely. To enforce blocking, you need rate-limiting, IP firewalls, or authentication — robots.txt is for cooperation, not security.
No. GPTBot is OpenAI's training crawler; Google ranking uses Googlebot, which is a different user-agent. You can block AI training crawlers without affecting search visibility — just keep Googlebot allowed.
At the root of your domain, served as plain text at https://example.com/robots.txt. It applies to that specific subdomain — robots.txt at the apex doesn't cover blog.example.com, which needs its own file.
Most major crawlers (Google, Bing) support * to match any sequence of characters and $ to anchor to end. So Disallow: /*.pdf$ blocks PDF files, Disallow: /admin/* blocks anything inside /admin/. Yandex and some smaller crawlers have more limited wildcard support.
It tells the bot to wait N seconds between requests. Useful when a single crawler is overwhelming a small server. Google ignores Crawl-delay (use Search Console's crawl rate setting instead); Bing and Yandex do honor it.
Where a thoughtful robots.txt prevents real problems.
Disallow /admin/, /staging/, /preview/, and search-result URLs (/search?q=). These pages should not appear in Google; if they do, that's wasted crawl budget plus possible duplicate-content issues.
If you don't want your content used to train LLMs, the easiest mitigation is the AI-bot block preset (GPTBot, ClaudeBot, CCBot, PerplexityBot). Honest companies respect it; you also have legal grounds if a respect-the-flag-then-don't case ever reaches court.
Adding a Sitemap: URL in robots.txt is the lowest-friction way to point all crawlers at your sitemap. Search Console picks up the sitemap reference automatically — no manual submission needed.
If a particular Russia-based bot is generating 80% of your traffic and giving zero referral value, add a Crawl-delay rule (or block it outright with Disallow: /). Server CPU is finite; not all crawl is worth it.
Habits that keep robots.txt useful and safe.
Googlebot needs to fetch CSS and JavaScript to render your pages. Blocking them via Disallow: /static/ or similar can break ranking. Verify using the URL Inspection tool in Search Console before disallowing static asset directories.
Adding a path to Disallow makes it discoverable — anyone can read your robots.txt and see exactly what you're trying to hide. Sensitive URLs need authentication, not robots.txt entries.
Use Google Search Console's robots.txt Tester to paste your draft and check that specific URLs would be allowed or disallowed as you expect. A typo can de-index half your site overnight.
If you want a page out of the index but still crawlable (so Google sees the noindex), use a meta robots noindex tag, not Disallow. Disallowing prevents the crawler from seeing the noindex and the URL can stay indexed forever.