Posts Tagged ‘robots’

Manipulating Search Engine User Agents

Wednesday, March 25th, 2009

A search engine optimisation specialist can instruct a search engine user agent (webbot/spider/crawler) what to index and what not to index using special external files called robots exclusion protocol files and on page meta tags or no follow attributes. The following The Web Clinic blog describes some of the SEO commands you can use to control what a search engine indexes off your ecommerce website…

It is useful to instruct a search engine not to crawl and index a page that is under construction. Adding the nofollow attribute to a link takes the form “<a href = “www.healthstore.uk.com” rel = “nofollow” >Health Store</a>” and prevents a link from being followed by search engine spiders.

The “nofollow” attribute can also be used in a robots meta tag placed in the head of a webpage. The following will instruct search engines not to index this page and not to follow any links from this page for use in indexing or weighting…

<meta name=”robots” content=”noindex, nofollow” />

The following will tell a spider not to index this page, but to allow the following of links that can then be indexed and weighted…

<meta name=”robots” content=”noindex, follow”>

The following will instruct the spider to index this page but not to follow any links from it and is most commonly used in message boards…

<meta name=”robots” content=”index, nofollow”>

“Robots exclustion protocol” is used to prevent directories from being indexed in a separate robots.txt file which is located in the site’s root directory.

The following instruction tells the search engines to disallow NO directories for any search engine.

User-agent: *
Disallow:

Conversely, the following command will disallow ALL directories for any search engine.

User-agent: *
Disallow: /

  • Share/Save/Bookmark
Follow our SEO experts in Twitter
  • What our SEO Experts says

  • Archives

  • SEO Keyword Cloud