Robots.txt tester
Check out our detailed article on robots.txt
Open articleWhat is a robots.txt file?
The robots.txt file is a standard text file placed in the main directory of a domain at the address /robots.txt. It informs search engine robots (crawlers) which parts of the site can be indexed and which should be skipped. It is part of the REP (Robots Exclusion Protocol), and although crawlers are not required to respect it, all major search engines like Google, Bing, or Yandex follow these rules.
Prevent the indexing of duplicates, staging pages, admin panels, and other resources that should not end up in search results.
Limit the crawling frequency using Crawl-delay to avoid overloading the server with search engine robots.
Indicate the location of the XML sitemap — makes it easier for search engines to discover all the subpages of the site.
robots.txt file syntax
| Directive | Example | Meaning |
|---|---|---|
User-agent |
User-agent: * | Specifies which bot the rules apply to. * means all bots. |
Disallow |
Disallow: /admin/ | Blocks access to the specified path and all subdirectories. |
Allow |
Allow: /public/ | Allows access to the path, even if the parent path is blocked. |
Sitemap |
Sitemap: /sitemap.xml | Points to the location of the XML sitemap for search engines. |
Crawl-delay |
Crawl-delay: 10 | Specifies the minimum interval in seconds between consecutive crawler requests. |
Disallow |
Disallow: | Empty Disallow — allows access to the entire site. |
Disallow |
Disallow: / | Blocks access to the entire site for a given bot. |
Sample robots.txt file
# all bots User-agent: * Disallow: /admin/ Disallow: /tmp/ Disallow: /search? Allow: /public/ # Googlebot only User-agent: Googlebot Disallow: /no-google/ # block Bingbot completely User-agent: Bingbot Disallow: / Sitemap: https://example.com/sitemap.xml
Frequently Asked Questions (FAQ)
Not exactly. Robots.txt tells crawlers not to visit certain addresses, but it does not guarantee that the page will not appear in search results. If another page links to the blocked URL, Google may index it without visiting. To completely block indexing, use the noindex meta tag or the X-Robots-Tag header.
Yes — the robots.txt file is always publicly available at /robots.txt. Do not put sensitive information about the site structure in it that you do not want to disclose, as anyone can read it.
Disallow blocks access to a path, Allow permits it. Allow has a higher priority than Disallow when both rules match the same URL. Example: Disallow: /folder/ and Allow: /folder/publiczny/ — the crawler will not enter /folder/ but will enter /folder/publiczny/.
Use our tester above — enter the domain or paste your own file and check if a specific URL is blocked for the selected bot. You can also use Google Search Console → robots.txt Tester, which shows how Googlebot interprets the rules.
The asterisk * is a wildcard representing all bots. Rules under User-agent: * apply to every crawler that does not have its own dedicated section. If a bot has its own section (e.g., User-agent: Googlebot), it only applies the rules from that section, ignoring the rules for .
No — the absence of a robots.txt file means that all bots can freely crawl the entire site. This is the default behavior. A problem arises when you want to block certain sections (e.g., /admin/, /staging/) or indicate the sitemap location.