What is Robots.txt – A quick tutorial

Continuing with our Basic SEO Tutorials, we will have an in-depth look at robots.txt mystery today.

What is robots.txt?

  • The robots.txt is a simple text file in your web site that inform search engine bots how to crawl and index website or web pages.
  • By default search engine bots crawl everything possible unless they are forbidden from doing so. They always scan the robots.txt file before crawling the web site.
  • Declaring a robots.txt means that visitors (bots) are not allowed to index sensitive data but it doesn’t mean that they can’t. The legal/good bots follow what is instructed to them but the Malware robots don’t care about it, so don’t try to use it as a security for your web site.

How to build a robots.txt file (Terms, Structure & Placement)?

The terms used in robots.txt and their meanings are given in tabular format.

meanings of terms in robots.txt file

You can get the names and functions of most of the bots from here. More information on SearchEnabler crawler – Quickobot.

what is robots.txt example

The robots.txt is usually placed in the root folder of your web site so that the URL of your robots.txt file resembles www.example.com/robots.txt in the web browser. Remember that you use all the lower case letter for the filename.

You can define different restrictions to different bots by applying bot specific rules but be aware that the more you make it complicated, it becomes harder for you to understand its traps. Always specify bot specific rules before specifying common rules so that bots read the file till the end to find rules specific to their names or else follow common rules.

You can check our many other sites robots.txt to get a feel on how these are generally implemented. http://www.searchenabler.com/robots.txt
http://www.google.com/robots.txt
http://searchengineland.com/robots.txt

Example scenarios for robots.txt

If you have a close look at SearchEnabler robots.txt, you can notice that we have blocked following pages from search indexing. You can analyze which pages and links should be blocked from your website. On a general note we advice hiding pages such as search results page within your web site and user logins, profiles, logs and styling CSS sheets.

1. Disallow: /?s=
It is a dynamic search results page and there is no point in indexing it which will create duplicate content problems.

2. Disallow: /blog/2010/
These are the blogs categorized in a year wise patterns and are blocked because they lead to duplication errors with different URLs pointing to the same web page.

3. Disallow: /login/
It is a login page meant only for users of searchenabler tool so it is blocked from getting crawled.

How does robots.txt affect search results?

By using the robots.txt file, you can hide the pages such as user profiles and other temp folders from being indexed and does not divulge your SEO effort into junk or the pages which are useless for the search results. In general, you results will be more precise and better valued.

How can tools help you in maintaining robots.txt?

Some good tools for scanning and checking your robots.txt file are Motoricerca robots checker and Frobee Robots.txt checker.

SearchEnabler SEO Platform reports its users on a weekly basis the links which are being blocked by robots.txt file. You can check from time to time that the pages which are meant to be crawled are not blocked.

webpage denied access from robots.txt error shown in searchenabler



free searchenabler trial version

Verify Robots & Other SEO Issues
Try SearchEnabler SEO platform
No credit card required. Quick setup.

Like The Article? Share With Your Friends !

  
 Vote on HN

Get blog posts via Email!

Only blog feed, No spam.

Connect With Us !

Add New Comment

Add Comment:

You must be logged in to post a comment.