Robots.txt Builder

Create a clean robots.txt with user-agent groups, Allow/Disallow rules, and optional Sitemap, Crawl-delay, and Host. Private by design—everything runs locally.

Groups & Rules

Output

Lines: 0
Bytes: 0
# Your robots.txt will appear here.

Place this file at https://yourdomain.com/robots.txt. Use comments (lines starting with #) to annotate choices.

Robots.txt: Best Practices & Examples

robots.txt tells crawlers which parts of your site they should or shouldn’t fetch. It is public and simple—use it to manage crawl behavior (server load, duplicate paths, parameters). It’s not a security feature nor a guaranteed de-indexing tool.

Key principles

  • Place it at the root: https://example.com/robots.txt. Subdomains need their own file (e.g., https://blog.example.com/robots.txt).
  • Start with a “catch-all” group: User-agent: *. Add named groups to override a specific crawler.
  • Be explicit: Many crawlers pick the most specific match. Use explicit Allow: lines for exceptions.
  • Disallow ≠ removal: Use <meta name="robots" content="noindex"> or X-Robots-Tag: noindex to prevent indexing.
  • Don’t block critical assets: Avoid blanket rules like /assets or /*.js$.
  • Wildcards & anchors: * and $ are widely supported, but test patterns.
  • Crawl-delay & Host: Not universal—use sparingly.
  • List sitemaps: Use absolute URLs (multiple allowed).

Common patterns

# Allow everything (baseline)
User-agent: *
Allow: /

# Typical private paths
User-agent: *
Disallow: /admin
Disallow: /login
Disallow: /cart
Disallow: /checkout
Disallow: /cgi-bin

# Parameter cleanup (use carefully; combine with canonicals)
User-agent: *
Disallow: /*?utm_*
Disallow: /*&utm_*
Disallow: /*?replytocom=*
Disallow: /*?sort=*
Allow: /*?page=1$

# Carve out a help page inside /admin
User-agent: *
Disallow: /admin
Allow: /admin/help

# Stricter rules for a specific bot (example)
User-agent: Bingbot
Crawl-delay: 5

# Sitemaps
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml

Staging & sensitive content

  • Staging sites: Protect with HTTP auth and/or noindex. Disallow: / alone isn’t sufficient if URLs leak.
  • Private data: Never rely on robots.txt for secrecy—use authentication and 401/403 responses.

Troubleshooting checklist

  • File reachable at /robots.txt (200 OK, text/plain)?
  • Paths start with /? Wildcards/anchors correct?
  • Critical assets (CSS/JS/images) not blocked?
  • Sitemap URLs are absolute and live?
  • For removals, use noindex or proper 404/410—not just Disallow.

Quick template

# E-commerce (typical)
User-agent: *
Disallow: /admin
Disallow: /login
Disallow: /cart
Disallow: /checkout
Disallow: /search
Disallow: /*?utm_*
Allow: /

Sitemap: https://shop.example.com/sitemap.xml

Tip: After updating, fetch your robots.txt in a browser to verify output, then test patterns in each search engine’s robots tester.

Explore more tools