What is a Search Engine?

If you use the Internet, you probably have had experience with search engines. Simply put, a search engine is a tool that helps you find what you are looking for faster than if you examined every candidate in a collection in turn, something that could take quite a long time!

Search engines are used to find many kinds of things: images, sounds, email addresses, and telephone numbers, to name a few. But the most widely used application of search engines is to find text--specifically, documents--that contain information you are interested in. Every HTML page on the Internet is a document, and most of them contain text that someone thinks is worth reading. It is only a matter of wading through the morass to find what you are really looking for. Enter the search engine.

The simplest form of a search engine is a tool or web site that organizes documents into categories. First you find the category you want, and then you browse everything in that category to find anything relevant. The popular term for this kind of tool on the Internet is a portal. A widely-known example is Yahoo!, located at http://www.yahoo.com.

A slightly more complicated form of search engine is a tool that asks you what you want to find, and then goes and searches every document it knows of on your behalf to try to find something relevant. Some would call such a tool an agent. Simplicity in this case comes at a high cost. Such a tool must necessarily retrieve every document every time you conduct a search. Even if the documents are located on a fast hard disk, this method can take a long time. If you must download the documents from a network first, this method quickly becomes untenable.

A revision of the previous approach is to have the search engine analyze every document before a search, and record information in a database that will help it find relevant documents later on without needing to download every document to check matches. This kind of search engine can be very fast.

The only drawback, in some cases, is that the database can be quite large, often similar in size to the sum of all the documents it references. Usually, larger databases equate with more sophisticated searching methods, which are able to find what you are looking for quicker and more accurately because the search engine has better knowledge about all the documents. Many commercial search engines store databases--otherwise known as indexes--whose sizes are greater than 30% of the original documents' sizes. A few search engines, like the one in SiteSurfer, can pack powerful searching functionality into databases less than 5 to 10% of the original sizes, depending on how the indexed documents are written.