robots.txt Validator
Paste any robots.txt and instantly validate every rule — group by User-agent, catch malformed Allow/Disallow paths, duplicate lines, bad Sitemap URLs and directives Google ignores.
Your robots.txt
Paste robots.txt contents
Comments (#…), directives (User-agent, Allow, Disallow, Sitemap, Crawl-delay) are all supported.
Validation Results
Rule groups
Sitemap directives
- inforobots.txt looks clean — ready to deploy.
How to Audit a robots.txt File
Four quick checks that catch the mistakes most often missed in production robots.txt files.
Paste the live file
Always audit the live https://yourdomain.com/robots.txt — not the staging version. CMS builds often rewrite rules at deploy time.
Check groups per User-agent
Each User-agent: block is a group. Googlebot, Bingbot and Googlebot-Image all read their own group only — generic * rules are a fallback.
Confirm every Sitemap URL
Sitemap: lines must be absolute URLs. Always add them — they're the fastest way for a crawler to discover your full URL list.
Remove Google-ignored directives
Crawl-delay, Host, Clean-param, and noindex in robots.txt are ignored by Googlebot. Either remove them or use the correct alternative (Search Console, meta robots).
robots.txt Directive Reference
What each directive actually does — and which crawlers respect it.
User-agent
Names the crawler a group applies to. Use * for all crawlers, or specific bot names (Googlebot, Bingbot, etc.).
Disallow
Blocks crawling of a URL path. Disallow: / blocks everything, Disallow: /admin/ blocks one folder.
Allow
Whitelists paths inside a Disallowed folder. More specific Allow rules override general Disallow.
Sitemap
Points crawlers to your sitemap.xml. Must be an absolute URL. Multiple Sitemap lines are allowed.
Crawl-delay
Seconds between requests. Honoured by Bing and Yandex — ignored by Google. Use Search Console for Google rate.
Comments (#)
Lines starting with # are comments. Document every non-obvious rule — future you will thank you.
Wildcards (*)
Match any sequence of characters in a path. Disallow: /*?filter= blocks all filtered URLs.
End anchor ($)
Matches end of URL. Disallow: /*.pdf$ blocks only PDF files, not pages that contain .pdf in the path.
Common robots.txt Mistakes
These slip into production regularly — and each one can tank indexing in days.
Indexing disasters
Noindex confusion
Crawl budget waste
Security leaks
robots.txt FAQ
Will Disallow remove a URL from Google?
No — Disallow only blocks crawling. The URL can still appear in Google results (without a snippet) if other sites link to it. Use meta noindex for true removal.
Should I block JavaScript and CSS?
Never. Google needs to render your pages to evaluate them. Blocking JS/CSS usually drops rankings — the page renders broken in Google's eyes.
Is my robots.txt sent anywhere?
No. Everything runs in your browser. Nothing is uploaded or stored.
Does my robots.txt need a BOM?
No. Save as plain UTF-8 text without a byte order mark. Many crawlers misread files that start with a BOM.
Want a Full Crawl & Indexability Audit?
Our Australian SEO team audits robots.txt, canonicals, noindex tags, redirects and sitemap coverage — then hands you a fix list in priority order.
- robots + canonical audit
- GSC coverage analysis
- No lock-in commitment
No long-term commitment. Cancel anytime. 100% satisfaction guaranteed.
