How search engines access and index your website? Are Google is the king of the web search in the planet? sharing here :)

How search engines access and index your website? Are Google is the king of the web search in the planet? sharing here ๐Ÿ™‚

How search engines access and index your website? Are Google is the king of the web search in the planet? sharing here ๐Ÿ™‚


Free Online Articles Directory




Why Submit Articles?
Top Authors
Top Articles
FAQ
ABAnswers

Publish Article

0 && $.browser.msie ) {
var ie_version = parseInt($.browser.version);
if(ie_version Login


Login via


Register
Hello
My Home
Sign Out

Email

Password


Remember me?
Lost Password?

Home Page > Internet > Blogging > How search engines access and index your website? Are Google is the king of the web search in the planet? sharing here ๐Ÿ™‚

How search engines access and index your website? Are Google is the king of the web search in the planet? sharing here ๐Ÿ™‚

Edit Article |

Posted: Jan 24, 2010 |Comments: 0
|



]]>
]]>

The key is a simple file called robots.txt that has been an industry standard for many years. It lets a site owner control how search engines access their web site. With robots.txt you can control access at multiple levels — the entire site, through individual directories, pages of a specific type, down to individual pages. Effective use of robots.txt gives you a lot of control over how your site is searched, but its not always obvious how to achieve exactly what you want. This is the first of a series of posts on how to use robots.txt to control access to your content.

What does robots.txt do? click here

http://rewebmaster.blogspot.com
http://googlinger.blogspot.com

The web is big. Really big. You just won’t believe how vastly hugely mind-bogglingly big it is. I mean, you might think it’s a lot of work maintaining your website, but that’s just peanuts to the whole web. (with profound apologies to Douglas Adams)

Search engines like Google read through all this information and create an index of it. The index allows a search engine to take a query from users and show all the pages on the web that match it.

In order to do this Google has a set of computers that continually crawl the web. They have a list of all the websites that Google knows about and read all the pages on each of those sites. Together these machines are known as the Googlebot. In general you want Googlebot to access your site so your web pages can be found by people searching on Google.

However, you may have a few pages on your site you don’t want in Google’s index. For example, you might have a directory that contains internal logs, or you may have news articles that require payment to access. You can exclude pages from Google’s crawler by creating a text file called robots.txt and placing it in the root directory. The robots.txt file contains a list of the pages that search engines shouldn’t access. Creating a robots.txt is straightforward and it allows you a sophisticated level of control over how search engines can access your web site.

Fine-grained control
In addition to the robots.txt file — which allows you to concisely specify instructions for a large number of files on your web site — you can use the robots META tag for fine-grain control over individual pages on your site. To implement this, simply add specific META tags to HTML pages to control how each individual page is indexed. Together, robots.txt and META tags give you the flexibility to express complex access policies relatively easily.

A simple example
Here is a simple example of a robots.txt file.

User-Agent: Googlebot

Disallow: /logs/

The User-Agent line specifies that the next section is a set of instructions just for the Googlebot. All the major search engines read and obey the instructions you put in robots.txt, and you can specify different rules for different search engines if you want to. The Disallow line tells Googlebot not to access files in the logs sub-directory of your site. The contents of the pages you put into the logs directory will not show up in Google search results.

Preventing access to a file
If you have a news article on your site that is only accessible by registered users, you’ll want it excluded from Google’s results. To do this, simply add a META tag into the html file, so it starts something like:

This stops Google from indexing this file. META tags are particularly useful if you have permission to edit the individual files but not the site-wide robots.txt. They also allow you to specify complex access-control policies on a page-by-page basis.

Learn more
You can find out more about robots.txt at http://www.robotstxt.org

Retrieved from “http://www.articlesbase.com/blogging-articles/how-search-engines-access-and-index-your-website-are-google-is-the-king-of-the-web-search-in-the-planet-sharing-here–1774663.html

(ArticlesBase SC #1774663)

Liked this article? Click here to publish it on your website or blog, it’s free and easy!

febry
About the Author:

http://rewebmaster.blogspot.com
http://googlinger.blogspot.com

]]>

Questions and Answers

Ask our experts your Blogging related questions here…

Ask

200ย Characters left

What are the steps of genetic engineering ?
I need to make my site in google search engines first page for one specific keyword search how can i please give me a responce
Pages: 1 2 3 4 5