As hard as it may be to believe, there may be a time when you do not want robots crawling your webpages.
According to robotstxt.org/ , Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses.
Specifying where search engines should look for content in high-quality directories or files you can increase the ranking of your site, and is recommended by Google and all the search engines.
A robots.txt can tell search engine spiders not to crawl or index certain sections or pages of your site. You can use it to prevent indexing totally, prevent certain areas of your site from being indexes or to issue individual indexing instructions to specific search engines. All search engines, or at least all the important ones, now look for a robots.txt There are a number of situations where you may wish to exclude spiders from some or all of your site. Basically, the file will look like this. It can be created using notepad text editor. Each entry has just two lines:
User-Agent: [Spider or Bot name]
Disallow: [Directory or File Name]
This line can be repeated for each directory or file you want to exclude, or for each spider or bot you want to exclude.
A few examples will make it clearer.
User-Agent: Googlebot
Disallow: /private/privatefile.htm
0 comments:
Post a Comment