curl trackbacks

I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).

To post a succesful link proposal I need some basic data :

about my page

  • url (must exist)
  • blog owner (free)
  • blog name (free)

about the other page

  • url (must exist)
  • excerpt (should be proper normal text)

my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.

other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).

Let’s get some stuff from the MSN rss feed :

  1. //a generic query = 5% success
  2. //add "(powered by) wordpress"
  3.       $query=urlencode('keywords+wordpress+trackback');
  4.       $xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=50&first=1&format=rss");
  5.       $count=0;
  6.       foreach($xml->channel->item as $i) {
  7.  
  8.            $count++;
  9.  
  10. //the data from msn
  11.            $target['link'] = (string) $i->link;
  12.            $target['title'] = (string) $i->title;
  13.            $target['excerpt'] = (string) $i->description;
  14.  
  15. //some variables I'll need later on
  16.            $target[id'] = $count;
  17.           $target['trackback'] = '';
  18.           $target['trackback_success'] = 0;
  19.  
  20.           $trackbacks[]=$target;
  21.       }

25% of the cms sites in the top of the search engines are WordPress scripts and WordPress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.

(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).

  1. for($t=0;$t<count ($trackbacks);$t++) {
  2. //I could use curl
  3. //but 95% of the urls offered are kosher and respond fast
  4.      $content = @file_get_contents($trackbacks[$t]['link']);
  5.      preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+".
  6.            "(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",
  7.         $content, &$matches);
  8.  $uri_array = $matches[1];
  9.  foreach($uri_array as $key => $link) {
  10.              if(strpos($link, 'rackbac')>0) {
  11.                 $trackbacks[$t]['trackback'] = $link;
  12.                 break;
  13.              }
  14.         }
  15. }
  16. </count>

When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.

  1. function cache_xml_store($trackbacks, $pagetitle)
  2. {
  3.  $xml = '< ?xml version="1.0" encoding="UTF-8"?>
  4. <trackbacks>';
  5.  for($a=0;$a<count ($trackbacks);$a++) {
  6.   $arr = $trackbacks[$a];
  7.   $xml .= '<entry>';
  8.   $xml .= '<id>'.$arr['id'].'</id>';
  9.   $xml .= '<excerpt>'.$arr['excerpt'].'</excerpt>';
  10.   $xml .= '<link>'.$arr['link'].'</link>';
  11.   $xml .= '<title>'.$arr['title'].'</title>';
  12.   $xml .= '';
  13.  }
  14.  $xml .= '</count></trackbacks>';
  15.  
  16.  $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
  17.  if(file_exists($fname)) unlink('cache/'.$fname);
  18.  $fhandle = fopen($fname, 'w');
  19.  fwrite($fhandle, $xml);
  20.  fclose($fhandle);
  21.  return;
  22. }

I use simplexml to read that cached file and show the excertps and links once the page is requested.

  1. // retrieve the cached xml and return it as array.
  2. function cache_xml_retrieve($pagetitle)
  3. {
  4.  $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
  5.  if(file_exists($fname)) {
  6.   $xml=@simplexml_load_file($fname);
  7.   if(!$xml) return false;
  8.   foreach($xml->entry as $e) {
  9.    $trackback['id'] =(string) $e->id;
  10.    $trackback['link'] =  rid((string) $e->link);
  11.    $trackback['title'] =  (string) $e->title;
  12.    $trackback['description'] =  (string) $e->description;
  13.  
  14.    $trackbacks[] = $arr;
  15.   }
  16.   return $trackbacks;
  17.  }
  18.  return false;
  19. }

(this setup requires a subdirectory cache set to read/write with chmod 777)

I use http://www.domain.com/financial+trends.html and extract the pagetitle as “financial trends’, which has an xml-file http://www.domain.com/cache/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).

  1. $pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));
  2.  
  3. $cached_excerpts = cache_xml_retrieve($pagetitle);
  4.  
  5. //do some stuff with, make it look nice  :
  6. for($s=0;$s<count ($cached_excerpts);$s++) {
  7. //this lists the trackback (candidates)
  8.     echo $cached_excerpts[$s]['excerpt'];
  9.     echo '<a href="'.$cached_excerpts[$s]['link'].'">'.$cached_excerpts['title'].'';
  10. }
  11. </count>

Now I prepare the data for the trackback post :

  1. for($t=0;$t<count ($trackbacks);$t++) {
  2.  
  3.     $trackback_url = $trackbacks[$t]['trackback'];
  4. //does it have a trackback target url ? then prepare data :
  5.     if($trackback_url !='') {
  6.         $trackback_data = array(
  7.  "url" => "url of my page with the link to the target",
  8.   "title" => "title of my page",
  9.  "blog_name" => "name of my blog",
  10.  "excerpt" => '[…]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[…]'
  11.         );
  12.         //…and try the trackback
  13.         $trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);
  14.     }
  15. }
  16. </count>

This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.

  1. function trackback_ping($trackback_url, $trackback)
  2.  {
  3.  
  4. //make a string of the data array to post
  5.  foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);
  6.         $postfields= implode('&', $strout);
  7.  
  8. //create a curl instance
  9.  $ch = curl_init();
  10.  curl_setopt($ch, CURLOPT_URL, $trackback_url);
  11.  curl_setopt($ch, CURLOPT_TIMEOUT, 3);
  12.  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
  13.  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  14.  
  15. //set a custom form header
  16.  curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));
  17.  
  18.  curl_setopt($ch, CURLOPT_NOBODY, true);
  19.  
  20.         curl_setopt($ch, CURLOPT_POST, true);
  21.  curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
  22.  
  23.  $content = curl_exec($ch);
  24.  
  25. //if the return has a tag 'error' with as value 0 it went flawless
  26.  $success = 0;
  27.  if(strpos($content, '>0')>0) $success = 1;
  28.  curl_close ($ch);
  29.  unset($ch);
  30.  return $success;
  31.  }

Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :

  1. for($t=0;$t<count ($trackbacks);$t++) {
  2.     if($trackbacks[$t]['trackback_success']>0) {
  3.         $store_trackbacks[]=$trackbacks[$t];
  4.     }
  5. }
  6. cache_xml_store($store_trackbacks, $pagetitle);
  7. </count>

voila : a page with only successful trackbacks.

Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.

0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.

*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :

  1. /**
  2.   * Fetches ping url from the given url
  3.   *
  4.   * @param string $url URL to probe for RDF
  5.   * @return string Ping URL
  6.   */
  7.  protected function getPingURL($url) {
  8.   $pingUrl = '';
  9.   // Get URL content
  10.   $urlContent = t3lib_div::getURL($url);
  11.   if ($urlContent && ($rdfPos = strpos($urlContent, '<rdf :RDF')) !== false) {
  12.    // RDF exists in this content. Get it and parse
  13.    $urlContent = substr($urlContent, $rdfPos);
  14.    if (($endPos = strpos($urlContent, '</rdf:RDF>', $rdfPos)) !== false) {
  15.     // We will use quick regular expression to find ping URL
  16.     $rdfContent = substr($urlContent, $rdfPos, $endPos);
  17.     $pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);
  18.    }
  19.   }
  20.   return $pingUrl;
  21.  }
  22. </rdf>
Posted in links, php, seo tips and tricks and tagged , , , .

4 Comments

  1. I get about 10% backlinks when I spider, more than I expected as the site doesn’t have any content of itself. Buying links or using tnx is more effective, as background process this works better. I want a stand alone script with an xml cache so I can use it on other sites without being dependant on script capabilities.

Leave a Reply

Your email address will not be published. Required fields are marked *