Generate Robots.txt File

Create your custom robots.txt file using our tool. Choose from presets like Allow All, Block All, WordPress Default, Block AI Bots, or E-commerce to set bot access rules. Customize additional settings such as sitemap URL and crawl delay. Once generated, simply copy or download the file for use on your website.

Group 1

User-Agent

Rules

Global Settings

Sitemap URL

Preview

User-agent: *
Disallow: /

How It Works

Step 1

Set Agents

Define which bots you want to target, starting with '*' for all

Step 2

Add Rules

Allow or disallow specific paths like /admin or /private

Step 3

Preview Output

Review the generated robots.txt structure in real time

Step 4

Save File

Download or copy the result and upload it to your root directory

What is a Robots.txt Generator?

A robots.txt generator is a tool that creates a properly formatted robots.txt file — a plain text file placed at the root of your website that instructs search engine crawlers which pages and directories to crawl or ignore. Robots.txt is one of the most fundamental technical SEO controls available, yet an incorrectly formatted file can accidentally block Google from crawling your entire site or key sections of it. The file uses simple directives like "Disallow" to prevent crawling and "Allow" to explicitly permit it, with rules applied per crawler using the "User-agent" directive. MonitorLinks.io's robots.txt generator helps you create a valid, error-free robots.txt configuration without needing to memorize the syntax or worry about formatting mistakes.

When to Use the Robots.txt Generator?

Setting up a new website

Create a robots.txt file from scratch before your site launches to prevent Google from indexing staging pages, admin areas, or duplicate content before you're ready.

Blocking low-value pages from crawling

Disallow crawlers from accessing search results pages, filter parameter URLs, or thin content that wastes your crawl budget without contributing to your site's search presence.

Protecting sensitive site areas

Prevent crawlers from accessing admin panels, login pages, or internal staging environments that should never appear in search results or be exposed to automated bots.

Adding a sitemap reference

Robots.txt is the standard place to include a Sitemap directive pointing to your XML sitemap URL, helping all crawlers locate it without depending solely on Search Console submission.

What is a Robots.txt?

robots.txt is a text file that instructs search engine crawlers which URLs a website allows or blocks from crawling. Website owners place rules inside the file to control crawler access to specific directories, pages, or file types.

Search engines such as Google, Bing, and Yandex read the robots.txt file before crawling a site. The crawler interprets directives and adjusts its crawl behavior.

Why is robots.txt file important?

robots.txt improves crawl efficiency and protects non-essential resources. The purposes of robots.txt include:

Controls crawler access to directories, URLs, and file paths
Prevents crawling of duplicate or low-value pages such as filters or parameters
Reduces server load by limiting unnecessary crawling
Guides bots to important resources like XML sitemaps

Example:

User-agent: *
Disallow: /admin/
Disallow: /search/

The example instructs all crawlers to avoid the /admin/ and /search/ directories.

What are the Requirements of a Robots.txt File?

A robots.txt file must be a plain UTF-8 text file placed in the root directory of a website. Crawlers only check this exact location.

The robots.txt file requirements are presented below.

Attribute	Requirement
File name	robots.txt
File format	Plain text (.txt)
Location	Root directory of the domain
Encoding	UTF-8 recommended

Example valid URL:

https://example.com/robots.txt

Invalid placements include:

https://example.com/files/robots.txt
https://example.com/robots.txt.txt

Search engines only request the file from the root path. If the file does not exist there, crawlers assume no restrictions.

Rules to follow when adding a robots.txt for your site:

Use exact file name robots.txt
Place the file in the top-level root directory
Ensure the file returns HTTP 200 status
Avoid redirects or blocked access

What are Robots.txt file's Directives and Syntax?

robots.txt directives define crawler permissions using specific rules and structured syntax. Each rule contains a directive followed by a path value.

What does `User-agent` mean?

User-agent specifies which crawler the rules apply to.

User-agent: Googlebot
User-agent: Bingbot
User-agent: *

Googlebot → crawler from Google
Bingbot → crawler from Microsoft
* → applies rules to all crawlers

What does `Disallow` mean?

Disallow blocks crawlers from accessing specific paths.

Disallow: /private/

The rule prevents bots from crawling any URL inside /private/.

What does `Allow` mean?

Allow permits crawlers to access a path even if a parent directory is blocked.

Disallow: /images/
Allow: /images/public/

The rule blocks /images/ but allows /images/public/.

What does `Sitemap` mean?

Sitemap specifies the location of the XML sitemap file for search engines.

Sitemap: https://example.com/sitemap.xml

The directive helps crawlers discover URLs faster. You can generate this file using an XML Sitemap Generator.

What does `Crawl-delay` mean?

Crawl-delay defines the number of seconds a crawler waits between requests.

Crawl-delay: 10

The rule instructs bots to wait 10 seconds between page requests. Not all search engines support this directive. Google ignores crawl-delay and uses Search Console settings instead.

What are required directives in a robots.txt file?

A minimal robots.txt file usually contains:

User-agent → defines the crawler
Disallow → defines blocked paths

User-agent: *
Disallow:

This configuration allows crawling of the entire website.

What are optional but recommended directives?

Additional directives improve crawler guidance:

Allow: defines exceptions to blocked paths
Sitemap: helps crawlers discover site URLs
Crawl-delay: reduces server load for supported bots

User-agent: *
Disallow: /admin/
Allow: /admin/admin-ajax.php
Sitemap: https://example.com/sitemap.xml

If your website doesn't have an XML Sitemap, you can create one for free with our XML Sitemap Generator.

Frequently Asked Questions

What is robots.txt and where should it be placed?

Robots.txt is a text file that tells web crawlers which pages they can and cannot request. It must be placed at the root of your domain (e.g., https://example.com/robots.txt) to be discovered automatically by search engine bots during crawling.

Does robots.txt prevent pages from being indexed?

No. Robots.txt controls crawling, not indexing. A blocked page can still appear in search results if other sites link to it. To prevent indexing specifically, use a noindex meta tag or X-Robots-Tag response header instead of relying on robots.txt.

What happens if I accidentally block Googlebot with robots.txt?

Google will stop crawling the blocked pages immediately. You may see indexed pages drop in Search Console within days. If your entire site is blocked, it can disappear from search results entirely. Always test changes using Search Console's URL inspection tool before deploying.

Can different search engines have different robots.txt rules?

Yes. Using specific "User-agent" directives, you can set different crawl rules for Googlebot, Bingbot, or any other named crawler. Using "User-agent: *" applies rules simultaneously to all crawlers that haven't been given specific instructions.

How do I know if my robots.txt is working correctly?

Use Google Search Console's robots.txt Tester tool to test your file against specific URLs. You can verify which pages are allowed or blocked and identify syntax errors before they cause crawling issues on your live site.