A Student's Guide to Search Engines

The Internet consists of billions upon billions of individual files, located on millions of different web sites. Certainly, then, the 'Net is a veritable treasure trove of information for students, who routinely turn the Web for help with their coursework. Yet, all this knowledge is of little use if you don't know how to locate and access what you need, quickly and easily. Enter search engines!

Search engines are essentially tools that allow you to search for information available on the Web using keywords and search terms. Rather than searching the Web itself, however, you are actually searching the engine's database of files.

Search engines are actually three separate tools in one. The spider is a program that "crawls" through the Web, moving from link to link, looking for new web pages. Once it finds new sites or files, it adds them to the search engine's index. This index is a searchable database of all the information the spider has found on the Web. Some engines index every word in each document, while others select certain words (such as those occurring most often). The search engine itself is a piece of software that allows users to search the engine's database. Clearly, an engine's search is only as good as the index it's searching.

So, when you run a query using a search engine, you're really only searching the engine's index of what's on the Web, as opposed to the entire Web. No one search engine is capable of indexing everything on the Web - there's just too much information out there! Consequently, much of the information overlooked in search engine queries includes breaking news, documents, multimedia files, images, tables, and other data. Collectively, these types of resources are referred to as the deep or invisible Web. They're buried deep in the Web and are invisible to search engines.

Search engines usually feature advertisements in the form of paid results. Those sites that pay a fee may be included in your search results, usually at the very top of the list. Some engines group advertisers separately, for instance, on the side of the page. While search engines must clearly label all advertisers, they vary widely in their compliance with FTC rules.

The method by which an engine determines how its search results will be presented to the user is called ranking. Early search engines, now called first generation search engines, used term ranking to organize their results. In term ranking, a result's importance is determined by how often and where the search term appears in the document. Most of today's search engines are much more sophisticated than this. These second generation search engines use a number of methods, sometimes in combination, to rank their results.

In relevancy ranking, second generation engines use various algorithms to determine a document's relevance to your search. This is by far the most popular ranking method, though engines often allow you to sort your results alphabetically or group them by site, source, and/or concept.

In addition to ranking method, a number of other features distinguish one engine from another. You will encounter two basic types of search engines. Individual search engines are just that - individual engines which compile and search their own databases. Meta search engines, on the other hand, search several individual search engines at once. Meta engines do not maintain their own databases, but search those assembled by other engines.

Meta engines present their search results in one of two ways. Using separate retrieval, they may group the results collected from each individual engine separately, so that you have a list of results from each engine. Separate retrieval often results in many redundant "hits." Collated retrieval solves this problem by weeding out duplicate results obtained from different individual engines. Many engines allow you to choose between the two or even use collated retrieval while still grouping your results by search engine source.

Search engines can be useful when conducting research, as long as you exercise caution. Always keep an eye out for paid results, and try to avoid engines that don't disclose their advertisers. Search engines are a good idea when you have a very specific or obscure topic to research. Queries for broad or popular topics will return more results than is practical, and the information found may be of questionable quality. Thus, you should stick to academic databases unless you are having trouble finding enough information on your topic. Search engines can also be helpful if you're looking for specific types of files, such as pictures.

Meta engines are even more helpful in tackling those "needle in a haystack" searches, since they conduct several searches at once. For this reason, they are also good if you want to get an overview of what is out there, or if your topic is an uncommon one. Keep in mind, though, that meta searches will only retrieve the most relevant results from each engine, since they're limited in the number of results they can uncover. Another casualty of meta searches is flexibility. Because you're searching many engines at once, you're only granted access to the search features that they all have in common.

Finally, you should not rely on any one engine. Recall that each engine draws on a different database (or group of databases, as is the case with meta engines). Some cover portions of the deep web, while others go nowhere near it. Accordingly, you will get a different set of results depending on which engine you use. Rather than become familiar with only one, try out a number of them and pick a dozen or so that you like. While you should read each engine's instructions before running a search, don't try to memorize all the details, since they're likely to change anyway!

Search engines are an excellent means of locating otherwise impossible-to-find information online. There are many options available for the discriminating student researcher, so be selective and choose an engine that meets your unique needs!

