curl trackbacks

I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).

To post a succesful link proposal I need some basic data :

about my page

  • url (must exist)
  • blog owner (free)
  • blog name (free)

about the other page

  • url (must exist)
  • excerpt (should be proper normal text)

my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.

other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).

Let’s get some stuff from the MSN rss feed :

  1. //a generic query = 5% success
  2. //add "(powered by) wordpress"
  3.       $query=urlencode('keywords+wordpress+trackback');
  4.       $xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=50&first=1&format=rss");
  5.       $count=0;
  6.       foreach($xml->channel->item as $i) {
  7.  
  8.            $count++;
  9.  
  10. //the data from msn
  11.            $target['link'] = (string) $i->link;
  12.            $target['title'] = (string) $i->title;
  13.            $target['excerpt'] = (string) $i->description;
  14.  
  15. //some variables I'll need later on
  16.            $target[id'] = $count;
  17.           $target['trackback'] = '';
  18.           $target['trackback_success'] = 0;
  19.  
  20.           $trackbacks[]=$target;
  21.       }

25% of the cms sites in the top of the search engines are WordPress scripts and WordPress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.

(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).

  1. for($t=0;$t<count ($trackbacks);$t++) {
  2. //I could use curl
  3. //but 95% of the urls offered are kosher and respond fast
  4.      $content = @file_get_contents($trackbacks[$t]['link']);
  5.      preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+".
  6.            "(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",
  7.         $content, &$matches);
  8.  $uri_array = $matches[1];
  9.  foreach($uri_array as $key => $link) {
  10.              if(strpos($link, 'rackbac')>0) {
  11.                 $trackbacks[$t]['trackback'] = $link;
  12.                 break;
  13.              }
  14.         }
  15. }
  16. </count>

When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.

  1. function cache_xml_store($trackbacks, $pagetitle)
  2. {
  3.  $xml = '< ?xml version="1.0" encoding="UTF-8"?>
  4. <trackbacks>';
  5.  for($a=0;$a<count ($trackbacks);$a++) {
  6.   $arr = $trackbacks[$a];
  7.   $xml .= '<entry>';
  8.   $xml .= '<id>'.$arr['id'].'</id>';
  9.   $xml .= '<excerpt>'.$arr['excerpt'].'</excerpt>';
  10.   $xml .= '<link>'.$arr['link'].'</link>';
  11.   $xml .= '<title>'.$arr['title'].'</title>';
  12.   $xml .= '';
  13.  }
  14.  $xml .= '</count></trackbacks>';
  15.  
  16.  $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
  17.  if(file_exists($fname)) unlink('cache/'.$fname);
  18.  $fhandle = fopen($fname, 'w');
  19.  fwrite($fhandle, $xml);
  20.  fclose($fhandle);
  21.  return;
  22. }

I use simplexml to read that cached file and show the excertps and links once the page is requested.

  1. // retrieve the cached xml and return it as array.
  2. function cache_xml_retrieve($pagetitle)
  3. {
  4.  $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
  5.  if(file_exists($fname)) {
  6.   $xml=@simplexml_load_file($fname);
  7.   if(!$xml) return false;
  8.   foreach($xml->entry as $e) {
  9.    $trackback['id'] =(string) $e->id;
  10.    $trackback['link'] =  rid((string) $e->link);
  11.    $trackback['title'] =  (string) $e->title;
  12.    $trackback['description'] =  (string) $e->description;
  13.  
  14.    $trackbacks[] = $arr;
  15.   }
  16.   return $trackbacks;
  17.  }
  18.  return false;
  19. }

(this setup requires a subdirectory cache set to read/write with chmod 777)

I use http://www.domain.com/financial+trends.html and extract the pagetitle as “financial trends’, which has an xml-file http://www.domain.com/cache/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).

  1. $pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));
  2.  
  3. $cached_excerpts = cache_xml_retrieve($pagetitle);
  4.  
  5. //do some stuff with, make it look nice  :
  6. for($s=0;$s<count ($cached_excerpts);$s++) {
  7. //this lists the trackback (candidates)
  8.     echo $cached_excerpts[$s]['excerpt'];
  9.     echo '<a href="'.$cached_excerpts[$s]['link'].'">'.$cached_excerpts['title'].'';
  10. }
  11. </count>

Now I prepare the data for the trackback post :

  1. for($t=0;$t<count ($trackbacks);$t++) {
  2.  
  3.     $trackback_url = $trackbacks[$t]['trackback'];
  4. //does it have a trackback target url ? then prepare data :
  5.     if($trackback_url !='') {
  6.         $trackback_data = array(
  7.  "url" => "url of my page with the link to the target",
  8.   "title" => "title of my page",
  9.  "blog_name" => "name of my blog",
  10.  "excerpt" => '[…]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[…]'
  11.         );
  12.         //…and try the trackback
  13.         $trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);
  14.     }
  15. }
  16. </count>

This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.

  1. function trackback_ping($trackback_url, $trackback)
  2.  {
  3.  
  4. //make a string of the data array to post
  5.  foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);
  6.         $postfields= implode('&', $strout);
  7.  
  8. //create a curl instance
  9.  $ch = curl_init();
  10.  curl_setopt($ch, CURLOPT_URL, $trackback_url);
  11.  curl_setopt($ch, CURLOPT_TIMEOUT, 3);
  12.  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
  13.  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  14.  
  15. //set a custom form header
  16.  curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));
  17.  
  18.  curl_setopt($ch, CURLOPT_NOBODY, true);
  19.  
  20.         curl_setopt($ch, CURLOPT_POST, true);
  21.  curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
  22.  
  23.  $content = curl_exec($ch);
  24.  
  25. //if the return has a tag 'error' with as value 0 it went flawless
  26.  $success = 0;
  27.  if(strpos($content, '>0')>0) $success = 1;
  28.  curl_close ($ch);
  29.  unset($ch);
  30.  return $success;
  31.  }

Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :

  1. for($t=0;$t<count ($trackbacks);$t++) {
  2.     if($trackbacks[$t]['trackback_success']>0) {
  3.         $store_trackbacks[]=$trackbacks[$t];
  4.     }
  5. }
  6. cache_xml_store($store_trackbacks, $pagetitle);
  7. </count>

voila : a page with only successful trackbacks.

Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.

0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.

*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :

  1. /**
  2.   * Fetches ping url from the given url
  3.   *
  4.   * @param string $url URL to probe for RDF
  5.   * @return string Ping URL
  6.   */
  7.  protected function getPingURL($url) {
  8.   $pingUrl = '';
  9.   // Get URL content
  10.   $urlContent = t3lib_div::getURL($url);
  11.   if ($urlContent && ($rdfPos = strpos($urlContent, '<rdf :RDF')) !== false) {
  12.    // RDF exists in this content. Get it and parse
  13.    $urlContent = substr($urlContent, $rdfPos);
  14.    if (($endPos = strpos($urlContent, '</rdf:RDF>', $rdfPos)) !== false) {
  15.     // We will use quick regular expression to find ping URL
  16.     $rdfContent = substr($urlContent, $rdfPos, $endPos);
  17.     $pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);
  18.    }
  19.   }
  20.   return $pingUrl;
  21.  }
  22. </rdf>

seo tricks : the magpie incident

Some universities like Southern California, Harvard and Michigan State have their web-guru’s explain to us how rss feeds work with the elegant Magpie parser demo :

Some example on how to use Magpie:

* magpie_simple.php *
Simple example of fetching and parsing an RSS file. Expects to be
called with a query param ‘rss_url=http://(some rss file)’
….

* magpie_debug.php *
Displays all the information available from a parsed feed.

Note : magpie_debug.php is the one to watch for, you can do a google search on :

site:.edu magpie_debug.php

and you get a number of educational facilities that kindly demonstrate the use of the magpie rss parser.

These demo pages have a textbox where you can enter an rss feed url, the magpie demo parses your feed and outputs it as an html-page.

You have to be careful with these programs, though : I actually found one domain (www.scripps.edu) with this remark under the ‘parse rss’ button :

Security Note:
This is a simple example script. If this was a real script we probably wouldn’t allow strangers to submit random URLs, and we certainly wouldn’t simply echo anything passed in the URL. Additionally its a bad idea to leave this example script lying around.

Thank you, you are surely wise like the buddha, I shall try to remember your insight !

….
note: after a while I decided I had had enough fun with magpies and took the blog off-line.

seo tricks : old wine in new bags…

Get some pagerank : this trick would require tedious boring link checking, but since SeoLinx (an extension of SeoQuake) that has become a lot easier. SeoLinx shows the stats of a links target url so you don’t have to go to every page to retrieve the stats. Cool plugin. Let’s put it to some practical use.

the trick : comment on old forum threads

Once you have SeoLinx installed find an ‘old’ forum, register if you haven’t already and make sure you get a signature link. Sometimes you first have to be a member for a week or write ten posts, but once you have a sig-link you get backlinks off the forum.

Then go comment on really old forum threads.

With SeoLinx you can easily spot the juicy old threads. Old threads on for instance DigitalPoint or Webmasterworld are sometimes pagerank 3. In case of the DP post, PR2 with 8 posts at the time of writing.

Pick a forum, and browse to the last page of the threads. Hover over the thread anchor and SeoLinx shows you the pagerank of the thread page. As long as the number of posts is below (10, 16 depending on the forum settings) you can put your comments in and they will appear on the first page of that thread, that has that nice pagerank and juice.

Old wine in new bags can be a sweet thing.

the benefit

A pagerank 3 ‘targetted’ anchor is worth about $9,- a month, $100,- per year. It can take an hour to find a juicy one, but hey, $100,- value for an hours work is well worth the trouble.


I might make this a blog feature, seo tips and tricks of the month.