What is Robots.txt – A quick tutorial
Continuing with our Basic SEO Tutorials, we will have an in-depth look at robots.txt mystery today.
What is robots.txt?
- The robots.txt is a simple text file in your web site that inform search engine bots how to crawl and index website or web pages.
- By default search engine bots crawl everything possible unless they are forbidden from doing so. They always scan the robots.txt file before crawling the web site.
- Declaring a robots.txt means that visitors (bots) are not allowed to index sensitive data but it doesn’t mean that they can’t. The legal/good bots follow what is instructed to them but the Malware robots don’t care about it, so don’t try to use it as a security for your web site.
How to build a robots.txt file (Terms, Structure & Placement)?
The terms used in robots.txt and their meanings are given in tabular format.
The robots.txt is usually placed in the root folder of your web site so that the URL of your robots.txt file resembles www.example.com/robots.txt in the web browser. Remember that you use all the lower case letter for the filename.
You can define different restrictions to different bots by applying bot specific rules but be aware that the more you make it complicated, it becomes harder for you to understand its traps. Always specify bot specific rules before specifying common rules so that bots read the file till the end to find rules specific to their names or else follow common rules.
You can check our many other sites robots.txt to get a feel on how these are generally implemented. http://www.searchenabler.com/robots.txt
Example scenarios for robots.txt
If you have a close look at SearchEnabler robots.txt, you can notice that we have blocked following pages from search indexing. You can analyze which pages and links should be blocked from your website. On a general note we advice hiding pages such as search results page within your web site and user logins, profiles, logs and styling CSS sheets.
1. Disallow: /?s=
It is a dynamic search results page and there is no point in indexing it which will create duplicate content problems.
2. Disallow: /blog/2010/
These are the blogs categorized in a year wise patterns and are blocked because they lead to duplication errors with different URLs pointing to the same web page.
3. Disallow: /login/
It is a login page meant only for users of searchenabler tool so it is blocked from getting crawled.
How does robots.txt affect search results?
By using the robots.txt file, you can hide the pages such as user profiles and other temp folders from being indexed and does not divulge your SEO effort into junk or the pages which are useless for the search results. In general, you results will be more precise and better valued.
How can tools help you in maintaining robots.txt?
SearchEnabler SEO Platform reports its users on a weekly basis the links which are being blocked by robots.txt file. You can check from time to time that the pages which are meant to be crawled are not blocked.