Robots.txt Builder
Groups & Rules
Output
Lines: 0
Bytes: 0
# Your robots.txt will appear here.
Place this file at https://yourdomain.com/robots.txt
. Use comments (lines starting with #
) to annotate choices.
Robots.txt: Best Practices & Examples
robots.txt
tells crawlers which parts of your site they should or shouldn’t fetch. It is public and simple—use it to manage crawl behavior (server load, duplicate paths, parameters). It’s not a security feature nor a guaranteed de-indexing tool.
Key principles
- Place it at the root:
https://example.com/robots.txt
. Subdomains need their own file (e.g.,https://blog.example.com/robots.txt
). - Start with a “catch-all” group:
User-agent: *
. Add named groups to override a specific crawler. - Be explicit: Many crawlers pick the most specific match. Use explicit
Allow:
lines for exceptions. - Disallow ≠ removal: Use
<meta name="robots" content="noindex">
orX-Robots-Tag: noindex
to prevent indexing. - Don’t block critical assets: Avoid blanket rules like
/assets
or/*.js$
. - Wildcards & anchors:
*
and$
are widely supported, but test patterns. - Crawl-delay & Host: Not universal—use sparingly.
- List sitemaps: Use absolute URLs (multiple allowed).
Common patterns
# Allow everything (baseline)
User-agent: *
Allow: /
# Typical private paths
User-agent: *
Disallow: /admin
Disallow: /login
Disallow: /cart
Disallow: /checkout
Disallow: /cgi-bin
# Parameter cleanup (use carefully; combine with canonicals)
User-agent: *
Disallow: /*?utm_*
Disallow: /*&utm_*
Disallow: /*?replytocom=*
Disallow: /*?sort=*
Allow: /*?page=1$
# Carve out a help page inside /admin
User-agent: *
Disallow: /admin
Allow: /admin/help
# Stricter rules for a specific bot (example)
User-agent: Bingbot
Crawl-delay: 5
# Sitemaps
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml
Staging & sensitive content
- Staging sites: Protect with HTTP auth and/or
noindex
.Disallow: /
alone isn’t sufficient if URLs leak. - Private data: Never rely on
robots.txt
for secrecy—use authentication and 401/403 responses.
Troubleshooting checklist
- File reachable at
/robots.txt
(200 OK,text/plain
)? - Paths start with
/
? Wildcards/anchors correct? - Critical assets (CSS/JS/images) not blocked?
- Sitemap URLs are absolute and live?
- For removals, use
noindex
or proper 404/410—not just Disallow.
Quick template
# E-commerce (typical)
User-agent: *
Disallow: /admin
Disallow: /login
Disallow: /cart
Disallow: /checkout
Disallow: /search
Disallow: /*?utm_*
Allow: /
Sitemap: https://shop.example.com/sitemap.xml
Tip: After updating, fetch your robots.txt
in a browser to verify output, then test patterns in each search engine’s robots tester.