When you are reading a mystery novel, some pages are obviously more important than others. One page of a who-dunnit will feature the ghastly murder by an unidentified hand, another page, usually the last page, will have the smooth-thinking inspector calling out the murderer in a hotel bar, or a ballroom on a luxury cruise ship. Either way, the “Col. Mustard did it with the lead pipe in the billiard room” page is a very important page. If you were to rank the pages in the mystery novel, the page where the identity of the murderer is revealed would in all likelihood be number one with a bullet.
But who decides which page is important? What if I simply want to know what the inspector’s dim-witted assistant was wearing in the Monte Carlo scene on page 42. I’d say page 42 was the more important page, based on my criteria. Good thing there’s a page rank tool, oddly enough called PageRank™ (the trademark is owned by Google, but the patent underneath the hood was developed by Stanford University who has leased exclusive license rights to Google for 1.8 million shares of Google, which were sold in 2005 for $336 million USD). Over the next two blog installments we’ll pop the hood on PageRank™ and see what makes it tick – the original developer was Larry Page, hence PageRank. Now would be a good time to mention that this exploration is not pro or con Google and we should also mention that Yahoo! developed a similar tool, called Webrank that was highly touted in 2004.
According to Wikipedia, PageRank is a linking analysis algorithm that assigns a numerical weighting to each element of a hyperlinked set of documents with the purpose of measuring its relative importance within the set – I’ll bet Larry Page wrote that.
Although Google says that PageRank considers more than 500 million variables and 2 billion terms, the first item to consider about PageRank is that it does not operate alone, as Google has stated on their Technology Review page that more than 200 signals are used in concert with the PageRank algorithm to determine which are the more important pages on the web to the most people. After the PageRank and its other criteria measuring pals have had a shot at analyzing pages, Google then conducts a hypertext-matching analysis to determine which pages are relevant to the specific search being conducted. Google claims that by combining overall importance and query-specific relevance, they are able to post the most relevant and reliable results at the top of the search result. Added to this arrangement, PageRank also considers the importance of pages linked to the requested page, because some pages are universally considered of greater value, and if the requested page has a link from one of these heavyweights, the requested page’s rank will be boosted simply by association.
Visit wikipedia for a detailed account of the mathematics behind PageRank and the dampening factor, but as a guy who graduated high school with Math for Gorillas, I’m good for now.
