Check out our detailed article on robots.txt

What is a robots.txt file?

The robots.txt file is a standard text file placed in the main directory of a domain at the address /robots.txt. It informs search engine robots (crawlers) which parts of the site can be indexed and which should be skipped. It is part of the REP (Robots Exclusion Protocol), and although crawlers are not required to respect it, all major search engines like Google, Bing, or Yandex follow these rules.

SEO

Prevent the indexing of duplicates, staging pages, admin panels, and other resources that should not end up in search results.

Crawl budget

Limit the crawling frequency using Crawl-delay to avoid overloading the server with search engine robots.

Sitemap

Indicate the location of the XML sitemap — makes it easier for search engines to discover all the subpages of the site.

robots.txt file syntax

Directive	Example	Meaning
`User-agent`	User-agent: *	Specifies which bot the rules apply to. * means all bots.
`Disallow`	Disallow: /admin/	Blocks access to the specified path and all subdirectories.
`Allow`	Allow: /public/	Allows access to the path, even if the parent path is blocked.
`Sitemap`	Sitemap: /sitemap.xml	Points to the location of the XML sitemap for search engines.
`Crawl-delay`	Crawl-delay: 10	Specifies the minimum interval in seconds between consecutive crawler requests.
`Disallow`	Disallow:	Empty Disallow — allows access to the entire site.
`Disallow`	Disallow: /	Blocks access to the entire site for a given bot.

Sample robots.txt file

# all bots
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: /search?
Allow: /public/

# Googlebot only
User-agent: Googlebot
Disallow: /no-google/

# block Bingbot completely
User-agent: Bingbot
Disallow: /

Sitemap: https://example.com/sitemap.xml

Frequently Asked Questions (FAQ)

Does robots.txt block page indexing?

Not exactly. Robots.txt tells crawlers not to visit certain addresses, but it does not guarantee that the page will not appear in search results. If another page links to the blocked URL, Google may index it without visiting. To completely block indexing, use the noindex meta tag or the X-Robots-Tag header.

Is robots.txt publicly visible?

Yes — the robots.txt file is always publicly available at /robots.txt. Do not put sensitive information about the site structure in it that you do not want to disclose, as anyone can read it.

What is the difference between Allow and Disallow?

Disallow blocks access to a path, Allow permits it. Allow has a higher priority than Disallow when both rules match the same URL. Example: Disallow: /folder/ and Allow: /folder/publiczny/ — the crawler will not enter /folder/ but will enter /folder/publiczny/.

How to check if robots.txt is working correctly?

Use our tester above — enter the domain or paste your own file and check if a specific URL is blocked for the selected bot. You can also use Google Search Console → robots.txt Tester, which shows how Googlebot interprets the rules.

What does User-agent: * mean?

The asterisk * is a wildcard representing all bots. Rules under User-agent: * apply to every crawler that does not have its own dedicated section. If a bot has its own section (e.g., User-agent: Googlebot), it only applies the rules from that section, ignoring the rules for .

Is the absence of a robots.txt file a problem?

No — the absence of a robots.txt file means that all bots can freely crawl the entire site. This is the default behavior. A problem arises when you want to block certain sections (e.g., /admin/, /staging/) or indicate the sitemap location.

Robots.txt tester

What is a robots.txt file?

robots.txt file syntax

Sample robots.txt file

Frequently Asked Questions (FAQ)