It’s public on purpose
Anyone (or any bot) can read /robots.txt. Hiding sensitive paths there just advertises them—use auth for real secrecy.
# Your robots.txt will appear here.
Place this file at https://yourdomain.com/robots.txt. Use comments (lines starting with #) to annotate choices.
robots.txt tells crawlers which parts of your site they should or shouldn’t fetch. It is public and simple—use it to manage crawl behavior (server load, duplicate paths, parameters). It’s not a security feature nor a guaranteed de-indexing tool.
https://example.com/robots.txt. Subdomains need their own file (e.g., https://blog.example.com/robots.txt).User-agent: *. Add named groups to override a specific crawler.Allow: lines for exceptions.<meta name="robots" content="noindex"> or X-Robots-Tag: noindex to prevent indexing./assets or /*.js$.* and $ are widely supported, but test patterns.# Allow everything (baseline)
User-agent: *
Allow: /
# Typical private paths
User-agent: *
Disallow: /admin
Disallow: /login
Disallow: /cart
Disallow: /checkout
Disallow: /cgi-bin
# Parameter cleanup (use carefully; combine with canonicals)
User-agent: *
Disallow: /*?utm_*
Disallow: /*&utm_*
Disallow: /*?replytocom=*
Disallow: /*?sort=*
Allow: /*?page=1$
# Carve out a help page inside /admin
User-agent: *
Disallow: /admin
Allow: /admin/help
# Stricter rules for a specific bot (example)
User-agent: Bingbot
Crawl-delay: 5
# Sitemaps
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml
noindex. Disallow: / alone isn’t sufficient if URLs leak.robots.txt for secrecy—use authentication and 401/403 responses./robots.txt (200 OK, text/plain)?/? Wildcards/anchors correct?noindex or proper 404/410—not just Disallow.# E-commerce (typical)
User-agent: *
Disallow: /admin
Disallow: /login
Disallow: /cart
Disallow: /checkout
Disallow: /search
Disallow: /*?utm_*
Allow: /
Sitemap: https://shop.example.com/sitemap.xml
Tip: After updating, fetch your robots.txt in a browser to verify output, then test patterns in each search engine’s robots tester.
Anyone (or any bot) can read /robots.txt. Hiding sensitive paths there just advertises them—use auth for real secrecy.
Bots pick the longest matching path in a group. A single Allow: /admin/help can override a broader Disallow: /admin.
If robots.txt is missing, crawlers assume no restrictions. A 5xx error, however, can make them back off.
You can list multiple Sitemap: URLs; they’re just hints, not directives—but major engines honor them.
Only * (any chars) and $ (end anchor) are widely supported. Fancy regex-style patterns aren’t part of the spec.