Crappy the pagerank spider v.1.00
Crappy the Pagerank Spider, v1.00
The very simple version, this one has some minor hick-up’s. It spiders and you can have it compose a google sitemap and a rudimentary pagerank map (I haven’t programmed it to handle www.domain.com/ as www.domain.com/index.doc).
It however works brilliant for indicating leaks in wordpress themes and other mayhem and mischief.
At the core : the pagerank calculation
for each file (node)
for each link-in to a file
take the in-linking file's value
divide that value by the linking file's outbound links
add it
sum-of-additions = the new node value
Iterate 30 times, for each node, and you’re quite close to a realistic pagerank distribution map.
-
for($ii=0;$ii<30;$ii++) {
-
foreach($this->MyFiles as $Fl) {
-
foreach($Fl->MyLinksIn as $Li) {
-
$In = $this->MyFiles($Li->url)->Value;
-
$InT = $this->MyFiles($Li->url)->LinkOutCount;
-
$val = $val + $In / $InT;
-
}
-
$Fl->Value = .15 + .85 * $val;
-
$val=0;
-
}
-
}
So for the calculation I need :
- a files-class with a collection LinksIn, LinksOut, and a variable for Value and LinkOut.Count.
- a class LinkIn
- a class LinkOut
- a spider function to grab the files and links
(i ditched the code listing, the echoed tables and linebreaks break my layout)
then there are the real goodies :
- a link validator
- a backlink check & check what url backlinks point at
- a robots.txt class
- a sitemap routine
- a mysql backend with store/restore/resume
- a domain stats grabber
- a trackback class
- a wordpress class
But that’s for Crappy 2.0 “Black Widow” (to be released into the wild in august 2008)






