Ever wonder how Google, Yahoo or MSN works?

A search engine provides the means to 'search' the content on the Internet to find information you are looking for from the convenience of your computer. Search engines provide users a web page called a 'form' that contains a field they can type in their search term, and submit it to the search engine. Results are then displayed to the user in a fraction of a second. It's not possible to actually search the entire Internet in that much time, so how do search engines work?

How Search Engines Work

At a very high level, the major search engines are huge collections of super-scalar software and hardware connected to the Internet via very fast circuits, that search the Internet looking for content, indexes and correlates the information so that the index may be searched by the engine's users.Building a good, fast search engine is an 'interesting computer science problem' and an area of research at many colleges and universities. Such a research project is exactly how Google got started.

The Search Form

The search form is where you type in your query. Some websites understand boolean search phrases (find me pages that have "this AND that"), use special search keywords (site:, file:, related: ) or have other features users find useful. Search engines are generally free. To make money and stay in business, search engines sell advertising space, so they compete fiercely for users and are constantly innovating new features and search capabilities.

The Website Submission Form

Website owners, and people that are just plain helpful, submit websites to the search engine provider's website submission form. website addresses submitted through this form are added to the search engine's list of 'starting pages', giving the search engine multiple starting points for searching the Internet (World Wide Web). Most search engine providers also automatically add a large list of popular web sites.

Crawlers or spiders start with a list of addresses submitted to the search engine by the website owners. They search those addresses and find links. They then crawl over the link to the next site the link points to. This is how the search engines find all the web pages on the Internet. If your page hasn't been submitted to a search engine, and nobody is pointing to it, your website will never appear as a search result.

The Crawler (or Spider)

Starting with the list of 'known' web sites from the submission engine and previous searches, the crawler reads the content of the known websites, finds hyperlinks within those websites, then follows the hyperlinks to other web pages and content within the website, and to other websites. Websites that were not submitted might be found through the hyperlinks the crawler finds.

The Indexer

Built into every search engine is logic that indexes all the content found by the crawler into a data structure that contains the relationships between websites, keywords, links and content. How many times a word appears in the page, how closely certain words appear within pages, and which words are linked to which websites and content are all organized as an index. Indexers help build what is called search relevance, and help the search engine understand that CSCO is Cisco's stock ticker symbol and a search for those four letters should retrieve stock quotes.

All this information is stored in a data storage system that allows for parallel search and instant retrieval. When you use the search form, the form submits your search to the search engine's search function, which searches the index created by the Indexer. There is no way the search engines could actually search the entire Internet in less than a quarter of a second and give you an answer.



Limitations of Search Engines

The Internet and all the content stored on it is VAST. Since collection process of indexing the Internet can take days or weeks, search engine results can lag behind changes to websitecontent, so links from a search engine may lead to a page that is no longer there, or to a page whose content has changed and is no longer relevant. To help reduce the impact of this, most search engines search popular websitesmuch more frequently so they can detect changes more quickly. Still, it may take a long time for content changes to be recognized by search engines.




Bookmark this page and SHARE:  



Free Training