tool : pagerank per url from a sitemap

I wired a google pagerank toolbar-query snippet to a simplexml sitemap readout, and put it on a page. You can fill in a sitemap url and get the google pageranks of all ‘mapped’ urls.

It works, I stripped it down and you can download it here or on the sample page.

I mainly wanted the snippet wired to a sitemap to compare the results of my pagerank spider tool with an actual google readout. Running a sitemap through a toolbar query snippet is the fastest way.

I allready had a spider result of siteometrics (calc pr) so now I can compare it to google’s toolbar query on http://www.siteometrics.com/sitemap.xml :

google pr calc pr URL
2 http://www.siteometrics.com/
2 0.80 /index.php
0 0.32 /advertise.html
0.77 /recommend.php
0.75 /search-engine-saturation.php
0 0.75 /link-popularity.php
0 0.75 /pagerank.php
0 0.75 /bulk-pagerank.php
0 0.75 /pagerank-mult-pages.php
0 0.75 /link-pop-pagerank.php
0.75 /link-search-pagerank.php
0 0.75 /alexa.php
0 0.75 /bulk-alexa.php
0 0.75 /serpcheck.php
0 0.75 /keyword-research.php
0 0.67 /visitor-info.php
0.24 /useful-links.html
0 0.24 /contact-us.html
0.24 /sitemap.html
0.24 /privacy-policy.html

Weird result, the sitemap they issue is part old site, part new site. If you check the pageranks on the newer .php files it’s the same, though.

a quarter of the urls link into the archived site, that might cause the drop in pagerank (links to /feed and google.com on every page, see the other article on siteometrics).


for the freaks : here’s the php code (assume url is a valid sitemap-url).


$myurl=$_REQUEST['url'];
$xml = simplexml_load_file($myurl);
foreach($xml->url as $u) echo pagerank((string) $u->loc)."
"; exit; function pagerank($url) { if (!preg_match('/^(http:\/\/)?([^\/]+)/i', $url)) { $url='http://'.$url; } $pr=curl_getpr($url); return $pr.';'.$url.';'; } function getch($url) { return CheckHash(HashURL($url)); } function curl_getpr($url) { $googlehost='toolbarqueries.google.com'; $googleua='Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5'; $ch = getch($url); $form="http://toolbarqueries.google.com/search?client=navclient-auto&ch=$ch&features=Rank&q=info:$url"; $cr = curl_init($form); curl_setopt($cr, CURLOPT_FAILONERROR, true); curl_setopt($cr, CURLOPT_HEADER, 0); curl_setopt($cr, CURLOPT_USERAGENT, $googleua); // Spoof the user-agent curl_setopt($cr, CURLOPT_RETURNTRANSFER, true); $data = curl_exec($cr); if(!$data) { curl_close($cr); unset($cr); $pr='-'; return $pr; } else { $pos = strpos($data, "Rank_"); if($pos === false) { curl_close($cr); unset($cr); $pr='-'; return $pr; } else{ $pr=substr($data, $pos + 9); $pr=trim($pr); $pr=str_replace("\n",'',$pr); curl_close($cr); unset($cr); return $pr; } } } //PageRank Lookup v1.1 by HM2K (update: 31/01/07) //based on an algorithm found at: http://pagerank.gamesaga.net/ //live demo: http://www.highrankforum.com/pagerank.php //convert a string to a 32-bit integer function StrToNum($Str, $Check, $Magic) { $Int32Unit = 4294967296; // 2^32 $length = strlen($Str); for ($i = 0; $i < $length; $i++) { $Check *= $Magic; if ($Check >= $Int32Unit) { $Check = ($Check - $Int32Unit * (int) ($Check / $Int32Unit)); //if the check less than -2^31 $Check = ($Check < -2147483648) ? ($Check + $Int32Unit) : $Check; } $Check += ord($Str{$i}); } return $Check; } //genearate a hash for a url function HashURL($String) { $Check1 = StrToNum($String, 0x1505, 0x21); $Check2 = StrToNum($String, 0, 0x1003F); $Check1 >>= 2; $Check1 = (($Check1 >> 4) & 0x3FFFFC0 ) | ($Check1 & 0x3F); $Check1 = (($Check1 >> 4) & 0x3FFC00 ) | ($Check1 & 0x3FF); $Check1 = (($Check1 >> 4) & 0x3C000 ) | ($Check1 & 0x3FFF); $T1 = (((($Check1 & 0x3C0) < < 4) | ($Check1 & 0x3C)) <<2 ) | ($Check2 & 0xF0F ); $T2 = (((($Check1 & 0xFFFFC000) << 4) | ($Check1 & 0x3C00)) << 0xA) | ($Check2 & 0xF0F0000 ); return ($T1 | $T2); } //genearate a checksum for the hash string function CheckHash($Hashnum) { $CheckByte = 0; $Flag = 0; $HashStr = sprintf('%u', $Hashnum) ; $length = strlen($HashStr); for ($i = $length - 1; $i >= 0; $i --) { $Re = $HashStr{$i}; if (1 === ($Flag % 2)) { $Re += $Re; $Re = (int)($Re / 10) + ($Re % 10); } $CheckByte += $Re; $Flag ++; } $CheckByte %= 10; if (0 !== $CheckByte) { $CheckByte = 10 - $CheckByte; if (1 === ($Flag % 2) ) { if (1 === ($CheckByte % 2)) { $CheckByte += 9; } $CheckByte >>= 1; } } return '7'.$CheckByte.$HashStr; }

seo : pagerank and serp part I

Today I go do something dumb, as usual ! I still had to finish my serp tool, and as I was checking out a site’s performance I really needed it, so I added a permutation routine and a mysql backend to the serp tool and tied it to a domain info class.

The serp is a three keys set with permutations,

  • single  1, 2, 3
  • double 12, 23, 13, 32, 21, 32
  • triple     123, 132, 213, 231, 312, 321

15 searches, my chosen keys are php, serp, pagerank, I get a full spread. 1500 results and  +/- 680 different hosts.

then i go do my old magic trick,

count the results, spot 1-3 = 3 points, spot 4-6 = 2 points and spot 7-10 = 1 point. The rest are 0.2 points so all domains get a count of the number of results and a sum for the points they score.

Then I know who are ‘top dog’ in my :) search engine result page.

Forums and communities have a broader spread (phpclasses, seomoz), more pages with content and more titles so they are bound to score on about every permutation and usually have a big pagerank (5 to 8), but what they gain in size they loose in strength and speed.

lets check the result set :

domain results points pagerank pages backlinks dmoz
www.siteometrics.com 16 32.8 2 37 244
seolutions.net 16 26.8 3 43 986
forums.digitalpoint.com 20 13.2 7 4354858 147000 1
www.selfseo.com 13 12 3 884 17500
www.toptenserp.com 9 8.6 2 196 763
www.seroundtable.com 9 8.2 4 15053 203000
www.prchecker.info 8 6.8 8 323 1460000
www.seochat.com 10 6.6 4 12972 558000
www.webmasterworld.com 24 6.4 7 230611 182000
marketingfeeds.nl 2 6 4 37237 6210
www.getfreesofts.com 13 5.4 4 275521 34900
www.shoemoney.com 3 5.2 6 12774 105000
video.aol.com 3 5.2 8 14073952 3300000 1
www.rankwhere.com 3 5.2 4 228 919
www.ljfind.com 3 5.2 3 1771559 1800
forum.siteground.com 9 5 7 30050 4570
www.phpclasses.org 6 4.8 6 150356 88100
www.hotscripts.com 6 4.8 7 219323 461000
hosthideout.com 7 4.6 5 161532 2150 1
www.top25web.com 4 4.4 3 143 15900
juustout.gethost.nl 8 4 0 173 184
www.webmasterforums.com 9 3.6 3 50550 45300
sitening.com 8 3.4 6 2714 29700 1
www.phplivesupport.com 3 3.4 7 1463 120000
www.webworkshop.net 4 3.4 5 119233 15100
www.google-pagerank.net 4 3.4 5 318 20200
www.database-search.com 3 3.4 3 14077 195000
en.wikipedia.org 2 3.2 9 208719522 56700000 1
www.nap.edu 3 3.2 9 2458 227000
www.php.net 1 3 9 170567 15700000
www.webopedia.com 1 3 5 35722 2170000
www.cs.und.edu 2 3 6 195 409
www.investopedia.com 1 3 4 19828 191000
www.smartpagerank.com 6 2.8 3 12051 25400
www.phpbits.info 5 2.6 2 22 2940
www.programsdb.com 13 2.6 4 236295 37700
www.serp-chem.eu 4 2.4 6 18 11400
www.pagerankcode.com 4 2.4 2 6 211
www.searchenginepanel.com 12 2.4 3 84 2590
www.devpapers.com 3 2.4 5 2196 152000
www.googlecommunity.com 2 2.2 5 251463 4150
www.webmasters.am 11 2.2 5 15587 3330
www.sitepoint.com 11 2.2 7 2134801 371000
link.ezer.com 2 2.2 4 2094 3180
forums.searchenginewatch.com 11 2.2 7 86027 142000 1
livepr.raketforskning.com 2 2.2 3 105136 122000
searchengineland.com 10 2 7 41187 442000 1
www.talkdigger.com 1 2 6 743479 145000

…which supports the idea that you don’t need massive backlink counts and even pagerank-0 (my other site (gloat)) is not any hindrance. Pagerank distribution in small sites is easier to manage so a few backlinks will do.

Next edition : seo : pagerank and serp : part II I pick a few small sites of the list and spider them, assert their link structure, retrieve the backlinks and spider the linking pages to see what links point directly to which urls, and make a “push-and-juice” analysis of the actual strength of these pages. Then I go relate that to the place on the search engne result page and estimate what is needed for a top-10 spot.

enough nonsense for today, here is a commercial message from Aunty Google :
[ad#test]