Home   Best Sellers   Blogging   Coding & Design   Technology   SEO   Travel & living   Career   Videos   Tips   Online tools     
Home  »     »  SEO TECHNIQUES - Part 1


Friday, June 4, 2010


PAGES : [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5

You can find millions of Web site on the internet and its number is fast growing.In such a scenario We need to think about the possibilities of some pretty good strategies that make your site viewable to the Web world.
Here we can have some chit chats on this.

So Guys..Lets start with Search Engines.

What is a Web Search Engine

A Web search engine is a search engine designed to search for information on the World Wide Web. Information may consist of web pages, images and other types of files.

Commonly used search engines are Yahoo,Google,Msn,Altavista....

How Web Search Engines Work

A search engine operates, in the following order
  1. Web crawling
  2. Indexing
  3. Searching

Web search engines work by storing information about many web pages, which they retrieve from the WWW itself. These pages are retrieved by a Web crawler (sometimes also known as a spider) an automated Web browser which follows every link it sees. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries. Some search engines, such as Google, store all or part of the source page (referred to as a cache) as well as information about the web pages, whereas others, such as AltaVista, store every word of every page they find.

When a user enters a query into a search engine (typically by using key words), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text.

A web crawler is a program which automatically traverses the web by downloading documents and following links from page to page . They are mainly used by web search engines to gather data for indexing. Web crawlers are also known as spiders, robots, bots etc.

How Crawlers/Spiders work

Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes.

Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.

Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is indexed -- added to the index -- it is not available to those searching with the search engine.

Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.

How to exclude site pages from Indexing

Exclusions can be made by the use of robots.txt.

Based on the specifications in robot.txt

the specified files or directory will stay hidden from Indexing

A Sample robot.txt file

dont create this file using the word processor

Created using Robot.txt generator

Here is what your robots.txt file should look like;


# Robots.txt file created by http://www.webtoolcentral.com
# For domain:

# All robots will spider the domain
User-agent: *

# Disallow Crawler V 0.2.1 admin@crawler.de
User-agent: Crawler V 0.2.1 admin@crawler.de
Disallow: /

# Disallow Scooter/1.0
User-agent: Scooter/1.0
Disallow: /

# Disallow directory /cgi-bin/
User-agent: *
Disallow: /cgi-bin/

# Disallow directory /images/
User-agent: *
Disallow: /images/


put this file in your root directory..

Read more on URL Rewrite in dotNet




SEO friendly URLs

URL Rewrite in dotNet
ISAPI_Rewrite : SEO in IIS

Share this!

How to link to this page?
If you wish to link to this page from your website, simply Copy and paste the above HTML code to your web page. It will appear on your page as:

comments powered by Disqus

This Weeks 7 Popular Posts

Subscribe to Recent Posts by Email
Stay connected to CROZOOM with regular Email notices of new Techie articles and IT Jobs. Updates will be delivered to your Inbox as soon as they are posted online.

Enter Your Email Address:  

Delivered by FeedBurner   RSS Feed

Search this Blog   

Urgent Openings for PHP trainees, Andriod / IOS developers and PHP developers in Kochi Trivandrum Calicut and Bangalore. Please Send Your updated resumes to recruit.vo@gmail.com   Read more »