The Basics behind a Search Engine

On the back end, a search engine is a piece of software that uses applications to collect information about web pages. The information collected is usually keywords or phrases that are possible indicators of what is considered on the web page as a whole, the URL of the page, the code that makes up the page, and links into and out of the page. That information is then indexed and stored in a database.

On the front end, the software has a user interface where users enter a search term – a word or phrase – in an attempt to find specific information. When the user clicks a search button, an algorithm then examines the information stored in the back-end database and retrieves links to web pages that appear to match the search term entered.

The process of collecting information about web pages is performed by an agent called a crawler, spider or robot. The crawler literally looks at every URL on the Web collecting keywords and phrases on each page, which are then included in the database that powers the search engine.

If you have spent any time on the Web, you may have heard a little about spiders, crawlers and robots. These little Web creatures are programs that crawl around the Web cataloging data, so it can be searched. In the most basic sense, all three programs – crawlers, spiders and robots – are essentially the same. They all “collect” information about each and every web URL.

Every search engine contains or is connected to a system of databases, where data about each URL on the Web is stored. These databases are massive storage areas that contain multiple data points about each URL. The data might be arranged in any number of different ways and will be ranked according to a method of ranking and retrieval that is usually proprietary to the company that owns the search engine.

A search algorithm is a problem-solving procedure that takes a problem, evaluates a number of possible answers, and then returns the solution to that problem. A search algorithm for a search engine takes the problem (the word or phrase being searched for), sifts through a database that contains cataloged keywords and the URLs those words are related to, and then returns pages that contain the word or phrase that was searched for, either in the body of the page or in a URL that points to the page.

There are several classifications of search algorithms, and each search engine uses algorithms that are slightly different. That’s why a search for one word or phrase will yield different results from different search engines.

Various types of search algorithms are used by search engines. In most cases, some proprietary search algorithm is created. The key to maximizing your search engine results is to understand a little about how each search engine you’re targeting works.

For a web search engine, the retrieval of data is a combined activity of the crawler (or spider or robot), the database and the search algorithm. Those three elements work in concert to retrieve the word or phrase that a user enters into the search engine’s user interface.

The query interface is what most people are familiar with and it’s probably what comes to mind when you hear the term “search engine.” The query interface is the page that users see when they navigate to a search engine to enter a search term.

The query interface or search engine result page (SERP) is the only part of a search engine that the user ever sees. Every other part of the search engine is behind the scenes, out of view of the people who use it every day. The back end is the most important part of the search engine.

Although retrieval and ranking are listed as separate subjects, they are actually part of the search algorithm.