about the trackback thing

The question about the trends script with trackbacks was wether a few hundred backlinks was worth the trouble, and it wasn’t. I wrote a second routine to grab the most common significant words from excerpts, and do a second search to grab better results and up to five trackbacks per page.

So I put that online, it grabbed 4000 backlinks in an hour and overloaded the host server.

Baidu, radian6 and google had stepped up indexing after I added sitewide tags and that didnt show up in analytics, the site got the trackback validations and crawlers and the server went haywire. It is a shared host, the resources are too limited to run that kind site on. I put it on hold till I find a solution for the hosting,

Google of course penalised the site with PR0 and dropped the domain from the serp on its main keywords, but in Yahoo it ranks about 20 out of 360 million result pages and in MSN it ranks no 1. I was thinking about adding a translator plugin and see if I can get some traffic from Baidu.

 

trackbacks

Trackbacks are brilliant stuff. I programmed a trackback module into the trends script yesterday just to see what it yields. As long as you don’t use it to spam and stick to common standards, it’s the fastest deep link building method available. I noticed another trends script is also using trackbacks.

GTrends lists an average 600 different searches per day, that makes 200K pages a year. If you put five blog excerpts with a link on a page you have 1000K backlink opportunities a year, automated, if you use trackbacks.

I got 50% success rate in the first tests, so I put it on a cronjob and it seems to level out at 30% successful links. That seemed a bit much, so I checked the PingCrawl plugin Eli (bluehatseo) and joshteam put together for WordPress. They claim a 80% success rate using Eli’s result scraper, I guess 30% is not aberrant.

For trends, I can’t narrow my search down too much. I need the most recent blogs for the trends buzz. Too narrow searches might exclude the recent news and the script would lose it’s usability. Besides, I figure 10% trackbacks would already be more than enough, a few hundred lines of code with a css template for 100K backlinks a year ain’t bad.

I don’t actually have anything to blog about today, so that’s it.

[added 3-3] ****ing brilliant, 65% trackbacks are accepted, increasing traffic, bots come crawling, finally something that works. Now add proxies.

[added 3-3] bozo style “the script got 4 uniques yesterday!”

Can I be honest ? Dude over at seounderworld gave me a vote of confidence on the trends script and I felt embarrased as the demo looks like shit and didn’t do anything. For scraper basics fine, but it lacked seo potential.

So I added some CSS, validated the source, added caching, gzip, rss-feed, sitemap, and the trackback module. It got 300 uniques yesterday and 400 uniques this morning on its first day out, so it performs better now and I don’t feel so embarrassed anymore.

(nice impression of the trends audience by the way)

I’ll add some proxies to prevent bans and some other stuff, once that’s done I’ll refresh the download.

google trends III

How to get the urls and snippets from the Google Trends details page. The news articles on the details page are listed with an ‘Ajax’ call, they are not sent to the browser in the html source. No easy way to scrape that.

The blog articles are pretty straight forward : first the ugly fast way :

  1. $mytitle='manuel benitez';
  2. $mydate=''; //2008-12-24
  3. $html=file_get_contents('http://www.google.com/trends/hottrends?q='.urlencode($mytitle).'&date=&sa=X');
  4. $start = strpos($html, '<div class="gsc-resultsbox-visible">');
  5. $end = strpos($html, '<div class="gsc-trailing-more-results">');
  6. $content = substr($html, $start, $end$start);
  7. echo $content;
  8. </div></div>

That returns the blog snippets, ugly. The other way : regular pattern matching : you can grab the divs that each content item has, marked with

  • div class=”gs-title”
  • div class=”gs-relativePublishedDate”
  • div class=”gs-snippet”
  • div class=”gs-visibleUrl”

from the html-source and organize them as “Content” array, after which you can list the content items with your own markup or store them in a database.

  1. //I assume $mytitle is taken from the $_GET array.
  2.  
  3. //array 'Content' with it's members
  4. Class Content {
  5.  var $id;
  6.  var $title;
  7.  var $pubdate;
  8.  var $snippet;
  9.  var $url;
  10.  
  11.  public function __construct($id) {
  12.   $this->id=$id;
  13.  }
  14. }
  15.  
  16. //grab the source from the google page
  17. $html=file_get_contents('http://www.google.com/trends/hottrends?q='.urlencode($mytitle).'&date=&sa=X');
  18.  
  19. //cut out the part I want
  20. $start = strpos($html, '<div class="gsc-resultsbox-visible">');
  21. $end = strpos($html, '<div class="gsc-trailing-more-results">');
  22. $content = substr($html, $start, $end$start);
  23.  
  24. //grab the divs that contain title, publish date, snippet and url with regular pattern match
  25. preg_match_all('!<div class=\”gs-title\”>.*?< \/div>!si', $html, $titles);
  26. preg_match_all('!<div class=\”gs-relativePublishedDate\”>.*?< \/div>!si', $html, $pubDates);
  27. preg_match_all('!<div class=\”gs-snippet\”>.*?< \/div>!si', $html, $snippets);
  28. preg_match_all('!<div class=\”gs-visibleUrl\”>.*?< \/div>!si', $html, $urls);
  29.  
  30. $Contents = array();
  31.  
  32. //organize them under Content;
  33.  
  34. $count=0;
  35. foreach($titles[0] as $title) {
  36. //make a new instance of Content;
  37.  $Contents[] = new Content($count);
  38. //add title
  39.  $Contents[$count]->title=$title;
  40.  $count++;
  41. }
  42.  
  43. $count=0;
  44. foreach($pubDates[0] as $pubDate) {
  45. //add publishing date (contains some linebreak, remove it with strip_tags)
  46.  $Contents[$count]->pubdate=strip_tags($pubDate);
  47.  $count++;
  48. }
  49.  
  50. $count=0;
  51. foreach($snippets[0] as $snippet) {
  52. //add snippet
  53.  $Contents[$count]->snippet=$snippet;
  54.  $count++;
  55. }
  56.  
  57. $count=0;
  58. foreach($urls[0] as $url) {
  59. //add display url
  60.  $Contents[$count]->url=$url;
  61.  $count++;
  62. }
  63.  
  64. //leave $count as is, the number of content-items with a 0-base array
  65. //add rel=nofollow to links to prevent pagerank assignment to blogs
  66. for($ct=0;$ct< $count;$ct++) {
  67.  $Contents[$ct]->url = preg_replace('/ target/', ' rel="nofollow" target', $Contents[$ct]->url);
  68.  $Contents[$ct]->title = preg_replace('/ target/', ' rel="nofollow" target', $Contents[$ct]->title);
  69. }
  70.  
  71. //its complete, list all content-items with some markup
  72. for($ct=0;$ct< $count;$ct++) {
  73.  echo '<h3>'.$Contents[$ct]->title.'';
  74.  echo '<p><strong>'.$Contents[$ct]->pubdate.'</strong>:<em>'.$Contents[$ct]->snippet.'</em></p>';
  75.  echo $Contents[$ct]->url.'<br />';
  76. }
  77. </div></div></div></div></div></div>

It ain’t perfect, but it works. the highlighter I use gets a bit confused about the preg_match_all statements containing unclosed div’s, so copying the code of the blog may not work.