Skip to main content



PAGES : [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5

You can find millions of Web site on the internet and its number is fast growing.In such a scenario We need to think about the possibilities of some pretty good strategies that make your site viewable to the Web world.
Here we can have some chit chats on this.

So Guys..Lets start with Search Engines.

What is a Web Search Engine

A Web search engine is a search engine designed to search for information on the World Wide Web. Information may consist of web pages, images and other types of files.

Commonly used search engines are Yahoo,Google,Msn,Altavista....

How Web Search Engines Work

A search engine operates, in the following order
  1. Web crawling
  2. Indexing
  3. Searching

Web search engines work by storing information about many web pages, which they retrieve from the WWW itself. These pages are retrieved by a Web crawler (sometimes also known as a spider) an automated Web browser which follows every link it sees. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries. Some search engines, such as Google, store all or part of the source page (referred to as a cache) as well as information about the web pages, whereas others, such as AltaVista, store every word of every page they find.

When a user enters a query into a search engine (typically by using key words), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text.

A web crawler is a program which automatically traverses the web by downloading documents and following links from page to page . They are mainly used by web search engines to gather data for indexing. Web crawlers are also known as spiders, robots, bots etc.

How Crawlers/Spiders work

Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes.

Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.

Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is indexed -- added to the index -- it is not available to those searching with the search engine.

Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.

How to exclude site pages from Indexing

Exclusions can be made by the use of robots.txt.

Based on the specifications in robot.txt

the specified files or directory will stay hidden from Indexing

A Sample robot.txt file

dont create this file using the word processor

Created using Robot.txt generator

Here is what your robots.txt file should look like;


# Robots.txt file created by
# For domain:

# All robots will spider the domain
User-agent: *

# Disallow Crawler V 0.2.1
User-agent: Crawler V 0.2.1
Disallow: /

# Disallow Scooter/1.0
User-agent: Scooter/1.0
Disallow: /

# Disallow directory /cgi-bin/
User-agent: *
Disallow: /cgi-bin/

# Disallow directory /images/
User-agent: *
Disallow: /images/


put this file in your root directory..

Read more on URL Rewrite in dotNet


SEO friendly URLs

URL Rewrite in dotNet
ISAPI_Rewrite : SEO in IIS


Popular posts from this blog

Payback Points - How to redeem - How to merge multiple payback accounts - Block Payback card - Payback customer care

Your SBI Debit card ending with XX0000 is deactivated only for Internet txn.

SBI account holders may have received an SMS with following message, supposed to be from State Bank of India (SBI).

Your SBI Debit card ending with XX0000  is deactivated only for Internet txn. To activate send SMS "SWON ECOM 0000" to 09223966666. No change for ATM/POS usage
** Replace the four Zeros with last 4 digits of your debit card number

Recently many of the SBI account holder has losed their money due to a hi-tech ATM robbery which happened in Thiruvananthapuram, capital city of Kerala.

Joomla and Forum Integration - Integrating Forums to Joomla

Joomla is one of the most popular CMS opensource packages. It is very easier to develop website's using Joomla. You just need to download Joomla package from Joomla's Official website  and install it on your domain and later adding customizations to templates and feature and Your website is ready :). Now a days most websites provides a forum section for it users for discussing various article topics, gathering opinions etc.

Following are some best know forum opensource packages which can be integrated with Joomla and create a new forum experience for users

Urgent Openings for PHP trainees, Andriod / IOS developers and PHP developers in Kochi Trivandrum Calicut and Bangalore. Please Send Your updated resumes to   Read more »
Search This Blog