The robots.txt
file is used by websites to communicate with web crawlers and bots about which parts of the site should not be crawled or accessed. Here are the main directives you can use in a robots.txt
file, along with detailed explanations and examples:
*
to apply the rules to all crawlers.plaintext User-agent: *
This means that the following rules apply to all web crawlers.plaintext User-agent: * Disallow: /private/
This tells all user agents that they should not access any URL that begins with /private/
.Disallow
directive, specifically when a directory is blocked, but you want to allow a specific page within that directory.plaintext User-agent: * Disallow: /private/ Allow: /private/public-info.html
This means that all user agents are disallowed from accessing /private/
but can access /private/public-info.html
.plaintext Sitemap: https://www.example.com/sitemap.xml
This line tells crawlers where to find the sitemap.plaintext User-agent: Googlebot Crawl-delay: 10
This tells Googlebot to wait 10 seconds between requests.robots.txt
specifications, some users include the Noindex
directive to indicate that a page should not be indexed. A more reliable way to prevent indexing is through meta tags or HTTP headers.plaintext User-agent: * Disallow: /noindex-directory/ Noindex: /noindex-directory/page.html
This is not recommended, as the Noindex
directive will not be recognized by most crawlers.`` Here, Googlebot is disallowed from accessing
/nogoogle/, Bingbot is disallowed from
/nobing/, and all other bots are disallowed from
/private/`.*
as wildcards in the Disallow
paths.plaintext User-agent: * Disallow: /*.jpg$
This disallows all URLs that end with .jpg
.robots.txt
file for clarification. Comments start with a #
.plaintext # This is a comment User-agent: * Disallow: /private/
This comprehensive guide to robots.txt
directives should cover most scenarios you’ll encounter. Remember that while robots.txt
provides direction to crawlers, it is not a security feature and should not be relied upon for protecting sensitive information.
This guide provides a detailed walkthrough of installing and configuring an Apache web server and…
WordPress development has evolved significantly, and modern tooling plays a crucial role in creating efficient…
I. Project Overview The goal is to automate the process of notifying search engines (like…
1. Database Structure (MySQL) We'll need a database table to store information about our website's…
This explanation aims to provide a solid foundation for understanding the process and implementing your…
Okay, here's a comprehensive guide on building a real-time website chat script using PHP, HTML,…