< Back | Home

Search engines, data algorithms improve

Technophile

By: Kevin Lynch

Posted: 10/15/04

Tn the beginning, there was the ARPANET. And then there was the Internet. And the Data poured upon the servers like mana from heaven. And then Al Gore said, "Let there be blinking lights," and there were blinking lights. Alright, so maybe the creation of the Internet wasn't as fantastic as that, and no, Al Gore did not create the Internet, contrary to popular belief. However, if he did, maybe the Internet would be much easier to navigate, with pertinent information finding you for a change, and not the other way around. That probably will never happen, with or without Al Gore and his algorithms, so the problem still remains. How do you search through all of the information found on the Internet, relevant or otherwise, accurately and efficiently?

The answer to that is now Google. Or is it? Back in the stone ages of computing, before the Internet as we know it today existed, the problem was already noticed. In 1990, the primary method of file storage and retrieval was through the File Transfer Protocol (FTP). At the time, the only way to tell what information was stored on which server was by word-of-mouth methods such as an e-mail or a message board posting. Archie, however, was the first tool to solve this problem, albeit in a very rudimentary fashion by today's standards. Archie gathered data by connecting to various anonymous FTP sites and archived the names of stored files, archiving them in a publicly accessible database. Archie was a revolutionary system which can be considered the forerunner of modern search engines.

As newer, improved technology became available, the searching methods became even more efficient. Several search systems have come and gone, introducing even better search algorithms and methods. Some of them have been wildly successful, while others have, for one reason or another, faded away. In the early nineties, various spider based search engines were created to roam webpages to look for links between pages, such as HotBot, Excite and WebCrawler. Others took a different approach, such as Yahoo!, to tame the Internet. Not a true search engine per se, Yahoo! is a searchable directory, allowing for a much more natural approach to data mining. However, at the time, search engines would never produce the same results. What would yield poor results on one search engine could yield a gold mine on another. Eventually, MetaCrawler was created to take and reformat the results of all of the other major search engines, angering the other search engines for lost users, and pleasing users with better results.

However, there has not been much activity in the way of search engine technology since the appearance of Google. All of the other great and unstoppable search engines have long since been forgotten, or at the very least have dragged from the top. So what is it that makes Google so great?

I have been using Google since its inception in late 1998, a time when Yahoo! still controlled the boards. However, within a short time, Google managed to rise to the top, using its tactics of surprise and innovation. Unlike all of the other flashy search engines, filled to the brim with animated ads, Google opted for a cleaner, much more user-friendly interface. Without distractions or long load times, Google used text almost exclusively, even for Google AdWords, text based ads which are usually relevant to the queried search. Over the years, Google has expanded a great deal since its humble beginnings, offering everything from advanced image searching to searching through thousands of mail order catalogs or even shopping for the best price online through Froogle. What's a search engine without indexed sites, though? Google's index now spans over 3.8 billion documents, including information stored in standard webpages, Microsoft Word Document (DOC) files, Adobe Portable Document (PDF) files and more.

To gain an edge over Google's competitor, Yahoo!, user-friendly features were added to their repertoire. Google News, a searchable compilation of all the latest breaking news, was created for users to easily stay on top of current events and get more than one view of the story. Google's latest creation, GMail, is a friendly webmail service offering 1000 megabytes of data storage and an extremely intuitive interface. But what's with all of these gimmicks? Sure, they are great additions to the already powerful search engine, but doesn't Google have any more ways to make searching, the foundation of their empire, better and more accurate?

One way in which Google has improved methods of searching is by inclusion of Google Sets. These sets are groups of related words. For example, a set search of the words "Gentoo," "Debian" and "Mandrake" predicts the related words "SuSE," "Slackware," "Red Hat" and many other Linux distributions and related topics. This search tool is very useful for finding related topics which a user might not have considered. Though this idea is not promoted by Google, as it is still being improved, other search engines have implemented it. Clusty, a new search engine which was officially launched as a beta service at the end of September, uses the same concept as sets, or clusters, as Clusty calls them. The purpose of Clusty is not to search pages quickly, but to add value to the information that has been gathered, which is exactly what Clusty does. Clusty groups everything and anything into clusters of data, whether it is by a news topic, such as the presidential election, or a web search of Nobel Laureates, everything is filtered into topics and clustered. In addition to the almost basic search and nearly standard news search, Clusty takes categories one step further and includes customizable tabs to search the Wikipedia Encyclopedia, browse through blogs, or peruse the latest gossip from all of the major entertainment sites, clustered by topic, naturally.

Clusty, however, is not the only new browser on the net. Just a few weeks prior to Clusty's debut, Amazon introduced its own A9 search engine. A9 is a fully configurable search engine which allows users to see what they want and only what they want. Built on top of Google's search engine, A9, like Clusty, combines customizability with power. It includes built in searching of the Internet Movie Database and, like Clusty, an encyclopedia powered by GuruNet. Where A9 attempts to surpass the others is in portability. After signing up for a free account, A9 users can do a search on a topic in one location, type up some notes on the different links, and save the best bookmarks for later. The ability to save searches, notes and bookmarks will allow for a truly mobile search experience.

With all of the newcomers to the search engine market, can a young startup succeed in bringing down the giant? Sure, Clusty offers advanced clustering by topic, but it is not so advanced that a user would be unable to spot differences by the title or the short snippet displayed on the search page. After all, it's not hard at all to figure out if "milf" is referring to the Moro Islamic Liberation Front, an American Pie reference, or something else entirely. Also, with A9's ability to track and store searches by user, many of the security conscious and conspiracy theorists will avoid the site like the plague, as they do with Google's GMail. To steal a movie quote from Troy, I picture Google waiting patiently for a challenger, as Achilles did with the Thessalian army, shouting "Is there no one else?"

Kevin Lynch is a sophomore majoring in computer engineering.
© Copyright 2009 The Triangle