Robots.txt Builder

Create a clean robots.txt with user-agent groups, Allow/Disallow rules, and optional Sitemap, Crawl-delay, and Host. Private by design—everything runs locally.

Groups & Rules

Output

Lines: 0
Bytes: 0
# Your robots.txt will appear here.

Place this file at https://yourdomain.com/robots.txt. Use comments (lines starting with #) to annotate choices.

Robots.txt: Best Practices & Examples

robots.txt tells crawlers which parts of your site they should or shouldn’t fetch. It is public and simple—use it to manage crawl behavior (server load, duplicate paths, parameters). It’s not a security feature nor a guaranteed de-indexing tool.

Key principles

  • Place it at the root: https://example.com/robots.txt. Subdomains need their own file (e.g., https://blog.example.com/robots.txt).
  • Start with a “catch-all” group: User-agent: *. Add named groups to override a specific crawler.
  • Be explicit: Many crawlers pick the most specific match. Use explicit Allow: lines for exceptions.
  • Disallow ≠ removal: Use <meta name="robots" content="noindex"> or X-Robots-Tag: noindex to prevent indexing.
  • Don’t block critical assets: Avoid blanket rules like /assets or /*.js$.
  • Wildcards & anchors: * and $ are widely supported, but test patterns.
  • Crawl-delay & Host: Not universal—use sparingly.
  • List sitemaps: Use absolute URLs (multiple allowed).

Common patterns

# Allow everything (baseline)
User-agent: *
Allow: /

# Typical private paths
User-agent: *
Disallow: /admin
Disallow: /login
Disallow: /cart
Disallow: /checkout
Disallow: /cgi-bin

# Parameter cleanup (use carefully; combine with canonicals)
User-agent: *
Disallow: /*?utm_*
Disallow: /*&utm_*
Disallow: /*?replytocom=*
Disallow: /*?sort=*
Allow: /*?page=1$

# Carve out a help page inside /admin
User-agent: *
Disallow: /admin
Allow: /admin/help

# Stricter rules for a specific bot (example)
User-agent: Bingbot
Crawl-delay: 5

# Sitemaps
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml

Staging & sensitive content

  • Staging sites: Protect with HTTP auth and/or noindex. Disallow: / alone isn’t sufficient if URLs leak.
  • Private data: Never rely on robots.txt for secrecy—use authentication and 401/403 responses.

Troubleshooting checklist

  • File reachable at /robots.txt (200 OK, text/plain)?
  • Paths start with /? Wildcards/anchors correct?
  • Critical assets (CSS/JS/images) not blocked?
  • Sitemap URLs are absolute and live?
  • For removals, use noindex or proper 404/410—not just Disallow.

Quick template

# E-commerce (typical)
User-agent: *
Disallow: /admin
Disallow: /login
Disallow: /cart
Disallow: /checkout
Disallow: /search
Disallow: /*?utm_*
Allow: /

Sitemap: https://shop.example.com/sitemap.xml

Tip: After updating, fetch your robots.txt in a browser to verify output, then test patterns in each search engine’s robots tester.

5 Fun Facts about robots.txt

It’s public on purpose

Anyone (or any bot) can read /robots.txt. Hiding sensitive paths there just advertises them—use auth for real secrecy.

Not a lock

Most specific wins

Bots pick the longest matching path in a group. A single Allow: /admin/help can override a broader Disallow: /admin.

Path precision

404 means “no rules”

If robots.txt is missing, crawlers assume no restrictions. A 5xx error, however, can make them back off.

Status matters

Sitemaps piggyback here

You can list multiple Sitemap: URLs; they’re just hints, not directives—but major engines honor them.

Hints, not orders

Wildcards are limited

Only * (any chars) and $ (end anchor) are widely supported. Fancy regex-style patterns aren’t part of the spec.

Simple patterns

Explore more tools