Your Robots.txt file is what tells the search engines which pages to access and index on your website on which pages not to. For example, if you specify in your Robots.txt file that you don’t want the search engines to be able to access your thank you page, that page won’t be able to show up in the search results and web users won’t be able to find it. Keeping the search engines from accessing certain pages on your site is essential for both the privacy of your site and for your SEO. This article will explain why this is and provide you with the knowledge of how to set up a good Robots.txt file.
How Robots.txt Work
Search engines send out tiny programs called “spiders” or “robots” to search your site and bring information back to the search engines so that the pages of your site can be indexed in the search results and found by web users. Your Robots.txt file instructs these programs not to search pages on your site which you designate using a “disallow” command. For example, the following Robots.txt command:
User-agent: *
Disallow: /mypage
…would block all search engine robots from visiting the following page on your website:
http://www.yoursite.com/mypage
Notice that before the disallow command, you have the command:
User-agent: *
The “User-agent:” part specifies which robot you want to block and could also read as follows:
User-agent: Googlebot
This command would only block the Google robots, while other robots would still have access to the page:
http://www.yoursite.com/mypage
However, by using the “*” character, you’re specifying that the commands below it refer to all robots. Your robots.txt file would be located in the main directory of your site. For example:
http://www.yoursite.com/robots.txt
Why Some Pages Need to Be Blocked
There are three reasons why you might want to block a page using the Robots.txt file. First, if you have a page on your site which is a duplicate of another page, you don’t want the robots to index it because that would result in duplicate content which can hurt your SEO. The second reason is if you have a page on your site which you don’t want users to be able to access unless they take a specific action.
List of some robot.txt
In all of these cases, you’ll need to include a command in your Robots.txt file that tells the search engine spiders not to access that page, not to index it in search results and not to send visitors to it. Let’s look at how you can create a Robots.txt file that will make this possible.
Installing Your Robots.txt File
Once you have your Robots.txt file, you can upload it to the main (www) directory in the CNC area of your website. You can do this using an FTP program like Filezilla. The other option is to hire a web programmer to create and to install your robots.txt file by letting him know which pages you want to have blocked. If you chooses this option, a good web programmer can complete the job in less than one hour.
How Robots.txt Work
Search engines send out tiny programs called “spiders” or “robots” to search your site and bring information back to the search engines so that the pages of your site can be indexed in the search results and found by web users. Your Robots.txt file instructs these programs not to search pages on your site which you designate using a “disallow” command. For example, the following Robots.txt command:
User-agent: *
Disallow: /mypage
…would block all search engine robots from visiting the following page on your website:
http://www.yoursite.com/mypage
Notice that before the disallow command, you have the command:
User-agent: *
The “User-agent:” part specifies which robot you want to block and could also read as follows:
User-agent: Googlebot
This command would only block the Google robots, while other robots would still have access to the page:
http://www.yoursite.com/mypage
However, by using the “*” character, you’re specifying that the commands below it refer to all robots. Your robots.txt file would be located in the main directory of your site. For example:
http://www.yoursite.com/robots.txt
Why Some Pages Need to Be Blocked
There are three reasons why you might want to block a page using the Robots.txt file. First, if you have a page on your site which is a duplicate of another page, you don’t want the robots to index it because that would result in duplicate content which can hurt your SEO. The second reason is if you have a page on your site which you don’t want users to be able to access unless they take a specific action.
List of some robot.txt
User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: / User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / User-agent: Twitterbot Allow: / Disallow: /search Disallow: /cgi-bin/ Allow: / User-agent: * Disallow: /*.html Allow: /*.html$
In all of these cases, you’ll need to include a command in your Robots.txt file that tells the search engine spiders not to access that page, not to index it in search results and not to send visitors to it. Let’s look at how you can create a Robots.txt file that will make this possible.
Installing Your Robots.txt File
Once you have your Robots.txt file, you can upload it to the main (www) directory in the CNC area of your website. You can do this using an FTP program like Filezilla. The other option is to hire a web programmer to create and to install your robots.txt file by letting him know which pages you want to have blocked. If you chooses this option, a good web programmer can complete the job in less than one hour.