Diving in Deep: Using the Deep Web for Legal Research

Many of us, if not most, use popular online search engines like Google and Yahoo! to search the web. While these search tools often locate what we want, we might still wonder to ourselves, "What else can I have find outside of Google?"

General search engines retrieve web pages by employing "spiders" or robots to visit web pages periodically and index their content. These general search engines, however, are not effective in locating "deep" or "invisible web" pages--web pages that do not contain hyperlinks, which would otherwise allow spiders or robots to identify pages. Examples of web page file formats not indexed by search engines include image files (i.e. tiff and gif), streaming media (i.e. flash and mp3), specialized searchable databases, and pages intentionally excluded by a web page designer. Since the deep Web is, by some estimations, 400-550+ times larger in size than the "surface" or "visible" Web, the importance of deep Web searching becomes all the more apparent.

Deep Web searching is particularly appropriate when specific or precise information such as statistics or data are needed. Deep Web searching is also appropriate when authoritative, timely, and exhaustive information is needed.

If general search engines do not retrieve what you want, deep Web searching may be worth a try. Deep Web pages may be identified by using subject directories and search engines.

For academic research, well-regarded subject directories that canvass the deep Web include the Librarians' Internet Index and Infomine. General deep Web search engines include Incywincy and OAlster.  For in-depth deep Web searching, consider meta-search engines such as SurfWax and Copernic Agent.

If you wish to locate other deep Web search engines, type your key words in a general search engine such as Google, followed by "database." For instance, if "'air pollution' and database" is entered into Google, you will retrieve the Environmental Protection Agency's AirData web site, which provides "access to air pollution data for the entire United States."  For example, I tinkered with the AirData web site, and I was able to generate 2007 Philadelphia Air Quality Index Report (below).

If you are already visiting a web site and you wish to determine if deep Web searching is available, review the site map to see if the words "database "or "statistics" appear. In addition, you may also want to search for "database" within the web site's internal search engine.

While many previously-invisible pages are now visible with the use of general search engines, the breadth, depth, and weight of the deep web provide a glimpse into the information world beyond Google.  As the title of this post indicated, however, the deep Web is still an emerging front in the field of research and requires time, effort, and, sometimes some additional assistance.  With that in mind, if you are interested in learning more about your options regarding the deep Web, don't hesitate to contact me for more information.