php pagerank class
I covered the background of this on another blog, it’s a simple php class script to calculate the pagerank of pages in a site.
2.1.1 Description of PageRank Calculation
Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page’s importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows:
We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.
PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper.
from : The Anatomy of a Large-Scale Hypertextual Web Search Engine
(Sergey Brin and Lawrence Page, page hosted at Stanford)
I checked ian rogers site (he seems to be a dancer, cool), he made some real nice examples on pagerank calculation that are excellent to start with.
I checked things like guaranix but that was table based and i wanted one i can link into a spider and run one simple calculation regardless of the amount of pages.
-
for($ii=0;$ii<40;$ii++) {
-
foreach($Web->Pages as $Page) {
-
foreach($Page->IncomingLinks as $Link) {
-
$ValueInlinkingPage = $Web->Pages($Link->url)->Value;
-
$LinksInlinkingPage = $Web->Pages($Link->url)->OutgoinglinksCount;
-
$NewValue = $NewValue + $ValueInlinkingPage / $LinksInlinkingPage;
-
}
-
$Page->Value = .15 + .85 * $NewValue;
-
$NewValue=0;
-
}
-
}
You get per linking page (pagevalue / pagelinks), your fair share. Every page starts out at 1, and then you run the calculation 20 to 40 times, receiving per page all shares and distributing them on. The pagevalues change towards a new balance, in math the ‘eigenvector of the normalised matrix’ (duh?). I am too stupid to do it with matrix-math so I use php classes.
This one calculates the rank of site files in a small model, the extended version is fed by a spider. I put it on it’s own page.
This one is for concepts.

-
-
class Spider {
-
-
var $index;
-
var $MyFiles = array();
-
-
public function MyFiles($code) {
-
if(!$this->MyFiles[$code]) {
-
$this->MyFiles[$code] = new MyFile($code);
-
}
-
return $this->MyFiles[$code];
-
}
-
}
-
-
Class MyFile {
-
-
var $url;
-
var $Pagerank = 0;
-
var $Value=0;
-
var $MyLinksIn = array();
-
var $MyLinksOut = array();
-
var $LinkOutCount = 0;
-
public function __construct($index) {
-
$this->url = $index;
-
}
-
-
public function MyLinksIn($code) {
-
if(!$this->MyLinksIn[$code]) {
-
$this->MyLinksIn[$code] = new MyLinkIn($code, $this->index);
-
}
-
return $this->MyLinksIn[$code];
-
}
-
-
public function MyLinksOut($code) {
-
if(!$this->MyLinksOut[$code]) {
-
$this->MyLinksOut[$code] = new MyLinkOut($code, $this->index);
-
}
-
return $this->MyLinksOut[$code];
-
}
-
-
}
-
-
Class MyLinkOut {
-
var $url = array();
-
var $count;
-
var $nofollow;
-
var $title;
-
var $rel;
-
public function __construct($index) {
-
$this->url = $index;
-
}
-
}
-
-
Class MyLinkIn {
-
var $url = array();
-
var $count;
-
var $nofollow;
-
var $title;
-
var $rel;
-
public function __construct($index) {
-
$this->url = $index;
-
}
-
-
}
-
-
//here we go : make a spider, call it 'core'
-
$core = new Spider;
-
-
//first the outgoing links, a to b, a to c, 2 links, etcetera
-
-
$myfl = $core->MyFiles("A");
-
$myL = $myfl->MyLinksOut("B");
-
$myL = $myfl->MyLinksOut("C");
-
$myfl->LinkOutCount=2;
-
-
$myfl = $core->MyFiles("B");
-
$myL = $myfl->MyLinksOut("C");
-
$myfl->LinkOutCount=1;
-
-
$myfl = $core->MyFiles("C");
-
$myL = $myfl->MyLinksOut("A");
-
$myfl->LinkOutCount=1;
-
-
$myfl = $core->MyFiles("D");
-
$myL = $myfl->MyLinksOut("A");
-
$myfl->LinkOutCount=1;
-
-
-
//then the incoming links, a collection that holds the page-ID's connected to the page.
-
//later on, i query per page the inlinking pages for value and linkcount
-
//and take my fair share
-
-
$myfl = $core->MyFiles("A");
-
$myL = $myfl->MyLinksIn("C");
-
-
$myfl = $core->MyFiles("B");
-
$myL = $myfl->MyLinksIn("A");
-
-
$myfl = $core->MyFiles("C");
-
$myL = $myfl->MyLinksIn("A");
-
$myL = $myfl->MyLinksIn("B");
-
$myL = $myfl->MyLinksIn("D");
-
-
//calculate pageranks, here I take 40 iterations, but 20 will do as well
-
for($ii=0;$ii<40;$ii++) {
-
-
//take the page collection, and for each page…
-
foreach($core->MyFiles as $Fl) {
-
-
//take the incoming links collection
-
-
foreach($Fl->MyLinksIn as $Li) {
-
//retrieve the inlinking page by url (a b c d) and get value and linkcount
-
$In = $core->MyFiles($Li->url)->Value;
-
$InT = $core->MyFiles($Li->url)->LinkOutCount;
-
//keep adding the fair shares
-
$val = $val + $In / $InT;
-
}
-
//set the new page-value to the sum-of-shares
-
$Fl->Value = .15 + .85 * $val;
-
//..and reset the 'val' variable to zero
-
$val=0;
-
}
-
-
//print the result
-
foreach($core->MyFiles as $Fl) {
-
echo $Fl->Value." ".$Fl->url." ";
-
}
-
echo "<br />";
-
-
}
I got one of these toolbar-query snippet (which only returns the 0-10 result, pagerank itself is calculated in a different scale) so now I can start comparing the spider/calculation vs the standing ranks as assigned by google and tune the model.









When i saw your implementation I realised there was no need to get complex with matrixes and vectors:).
Thanks!