pagerank of directory sites
juust | August 15, 2008I keep getting N/A as pagerank of the Gethost-subdomain, and now I ran a check through the Yahoo indexedpages and I get to my surprise a pagerank 2 on the blog index (the rest ain’t ranked).
I thought about using a spider to retrieve the pages but that’s too slow, and not everyone uses a sitemap so then I remembered I had some routine to get IndexedPages from Yahoo siteExplorer, might as well use that one. works fine.
I’ll put an ajax version online next week,
I am first going to wire it to my directory list and check which directories have a nice juicy “web development and programming” directory, yahoo keeps a max5000 query limit, app. max 200,000 pages a day. might take a while to map the whole directory circus…
rev. 16-08
I first tried 1000 pages per directory but the pagerank query makes it slow, and after 16.000 result pages I had 100 ranked pages, 0.6%. Another run of 7700 files (this time only 50 per directory, pot luck) returned 200 ranked pages, 3%.
Mostly its the index and links present on the index page (top level categories) that are indexed, the rest dont rank. Only the older sites have ‘deeper’ ranking pages (and Dmoz, that one ranks Pr5/6 on all inner pages.).
Pages that aren’t ranked don’t contribute any pr-value., which is why directory submission is cheap (.15cts per link) whereas a ranking anchor is between $5 and $20 per month, just like a regular ad.
I tried different link schema’s on a blog on a subdomain for a while, and now I am gonna play with a link directory, try a few configs and see which one yields the fastest and highest rank distribution, and what gets the site indexed.
links.trismegistos.net link directory : sitemap
The ways to get a site indexed are feeding a sitemap to google, yahoo, ask, msn, or firing em per url at the smaller urlsubmit-engines, or getting other indexed sites to link to the pages (del.icio.us, twemes.com, any indexed bookmark site), or reciprocals from listed sites.
I am too lazy to dump every url on a bookmark site, and using a sitemap is more proper webmaster stuff, so I decided to make a sitemap.xml for my directory. I aint gonna type every url in notepad, so I programmed a quick tree-traversal on the category-table of phpLD and added a sitemap format (note : this works for sites with mod_rewrite, through the TITLE_URL field).
-
-
$root='http://links.trismegistos.net';
-
-
$content = '';
-
-
$content .= "< ?xml version='1.0' encoding='UTF-8'?>\n";
-
$content .= "<urlset xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd' xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>\n";
-
-
$content .= read(0, $root, $arr);
-
-
$content .= "</urlset>\n";
-
echo $content;
-
-
function read($rootid, $pathid, $result) {
-
$link = connect();
-
$myqry = "SELECT * FROM `PLD_CATEGORY` WHERE `PARENT_ID`='".$rootid."'";
-
//echo $myqry;
-
$myres = mysql_query($myqry, $link) or die(mysql_error());
-
if(mysql_num_rows($myres)=0) return;
-
while($row=mysql_fetch_assoc($myres)) {
-
$thispath= $pathid ."/".$row['TITLE_URL'];
-
$mycontent .= "<url>\n";
-
$mycontent .= chr(9)."<loc>".$thispath."/</loc>\n";
-
$mycontent .= chr(9)."<lastmod>".gmdate('Y-m-d', time())."T01:01:14+00:00</lastmod>\n";
-
$mycontent .= chr(9)."<changefreq>daily</changefreq>\n";
-
$mycontent .= chr(9)."<priority>1.0</priority>\n";
-
$mycontent .= "</url>\n";
-
$mycontent .= read($row['ID'], $thispath, $result);
-
}
-
@mysql_close($link);
-
return $mycontent;
-
}
-
-
-
function connect() {
-
$DB_USER = "*******";
-
$DB_PASSWORD = "*******";
-
$DB_HOST = "*******";
-
$DB_DATA = "*******";
-
$link = mysql_connect($DB_HOST, $DB_USER, $DB_PASSWORD) or $error = mysql_error();
-
if (!$link) {
-
echo $error;
-
exit;
-
} else {
-
mysql_select_db($DB_DATA, $link) or $error = mysql_error();
-
return $link;
-
}
-
}
(rev 17-08 changed the “
that returns a page with as source the sitemap, that you can cut and paste as sitemap.xml and feed to google or ping to msn and yahoo (at msn i’d get an account, they also have a webmaster section these days).
you might get a warning on the dates,
in that case use only
$mycontent .= chr(9).”
and search/replace the date with some proper date format (rip it from a wordpress sitemap), I did.
site links
I put the footer links, rss-feed and ’submit’ links on ‘nofollow’, and put my own sites in a top category as featured link, in stead of the main template.
Otherwise the site bleeds all over the place especially if you keep a few outbound ‘follow’ links on a page like the submit page (where you have six, seven other links, and bleed 15 to 30% in one page, for category pages the loss is far less )
links back
The other ways as mentioned are bookmarking sites, or reciprocal links from listed sites (that only works if the site page linking back is indexed.)
Links back to the index pump that node up (and the links off of the page, the top-category level) but not the all-important pages with the actual links, so links back to the index are useless for the users. A link back to the supplying category pages works a lot better for the rest of the users.
One directory-site got the point, they allowed a link back the category page itself, and if the link back is on an indexed page that means the directory-category page is crawled from that point and may get indexed and ranked a lot faster. Works better for both parties.
So we’ll see in six months what the pagerank of the index is and how many of the subpages are ranked and indexed.






