Question 1

Where do I put the robots.txt file?

Accepted Answer

The robots.txt file must be placed in the root directory of your website, accessible at https://yourdomain.com/robots.txt. It won't work in subdirectories.

Question 2

Does robots.txt block pages from appearing in Google?

Accepted Answer

Not exactly. Robots.txt blocks crawling, not indexing. If other sites link to a page you've disallowed, Google may still show it in search results (without a snippet). To fully block indexing, use a 'noindex' meta tag instead.

Question 3

What does 'User-agent: *' mean?

Accepted Answer

The asterisk (*) is a wildcard that matches all crawlers. Rules under 'User-agent: *' apply to every bot that doesn't have its own specific section in the file.

Question 4

Should I block AI crawlers like GPTBot?

Accepted Answer

That depends on your preference. If you don't want your content used for AI training, you can add 'User-agent: GPTBot' with 'Disallow: /' to block OpenAI's crawler. Similar rules exist for other AI bots like Google-Extended and CCBot.

Question 5

What is Crawl-delay?

Accepted Answer

Crawl-delay tells bots to wait a specified number of seconds between requests. It's supported by Bing and Yandex but ignored by Google. Use it if aggressive crawling is overloading your server.

Question 6

Can I use wildcards in robots.txt paths?

Accepted Answer

Google and Bing support limited wildcards: * matches any sequence of characters, and $ marks the end of a URL. For example, 'Disallow: /*.pdf$' blocks all PDF files. Not all bots support these extensions.

Robots.txt Generator

About This Tool

How to Use

Frequently Asked Questions

Related Tools

Htaccess Redirect Generator

IP Subnet Calculator

Cron Expression Generator