curl trackbacks

I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).

To post a succesful link proposal I need some basic data :

about my page

  • url (must exist)
  • blog owner (free)
  • blog name (free)

about the other page

  • url (must exist)
  • excerpt (should be proper normal text)

my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.

other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).

Let’s get some stuff from the MSN rss feed :

  //a generic query = 5% success  //add "(powered by) wordpress"   $query=urlencode('keywords+wordpress+trackback');  $xml = @simplexml_load_file("$query&count=50&first=1&format=rss");  $count=0;  foreach($xml->channel->item as $i) {  $count++;  //the data from msn  $target['link'] = (string) $i->link;  $target['title'] = (string) $i->title;  $target['excerpt'] = (string) $i->description;  //some variables I'll need later on  $target[id'] = $count;  $target['trackback'] = '';  $target['trackback_success'] = 0;  $trackbacks[]=$target;  }  

25% of the cms sites in the top of the search engines are WordPress scripts and WordPress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.

(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).

  for($t=0;$t]*?href[\s]?=[\s\"\']+".  "(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",  $content, &$matches);  $uri_array = $matches[1];  foreach($uri_array as $key => $link) {   if(strpos($link, 'rackbac')>0) {   $trackbacks[$t]['trackback'] = $link;  break;   }  }  }  

When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.

  function cache_xml_store($trackbacks, $pagetitle)   {  $xml = '< ?xml version="1.0" encoding="UTF-8"?>  ';  for($a=0;$a'.$arr['id'].'';  $xml .= ''.$arr['excerpt'].'';  $xml .= ''.$arr['link'].'';  $xml .= ''.$arr['title'].'';  $xml .= '';  }  $xml .= '';  $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';  if(file_exists($fname)) unlink('cache/'.$fname);  $fhandle = fopen($fname, 'w');  fwrite($fhandle, $xml);  fclose($fhandle);  return;  }  

I use simplexml to read that cached file and show the excertps and links once the page is requested.

  // retrieve the cached xml and return it as array.  function cache_xml_retrieve($pagetitle)  {  $fname = 'cache/trackback'.urlencode($pagetitle).'.xml';  if(file_exists($fname)) {  $xml=@simplexml_load_file($fname);  if(!$xml) return false;  foreach($xml->entry as $e) {  $trackback['id'] =(string) $e->id;  $trackback['link'] =  rid((string) $e->link);  $trackback['title'] =  (string) $e->title;  $trackback['description'] =  (string) $e->description;  $trackbacks[] = $arr;  }  return $trackbacks;  }   return false;  }  

(this setup requires a subdirectory cache set to read/write with chmod 777)

I use and extract the pagetitle as “financial trends’, which has an xml-file (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).

  $pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));  $cached_excerpts = cache_xml_retrieve($pagetitle);  //do some stuff with, make it look nice  :  for($s=0;$s'.$cached_excerpts['title'].'';  }  

Now I prepare the data for the trackback post :

  for($t=0;$t "url of my page with the link to the target",  "title" => "title of my page",  "blog_name" => "name of my blog",  "excerpt" => '[...]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[...]'  );  //...and try the trackback  $trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);  }  }  

This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.

  function trackback_ping($trackback_url, $trackback)  {  //make a string of the data array to post  foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);  $postfields= implode('&', $strout);  //create a curl instance  $ch = curl_init();  curl_setopt($ch, CURLOPT_URL, $trackback_url);  curl_setopt($ch, CURLOPT_TIMEOUT, 3);  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);  //set a custom form header  curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));  curl_setopt($ch, CURLOPT_NOBODY, true);  curl_setopt($ch, CURLOPT_POST, true);  curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);	  $content = curl_exec($ch);  //if the return has a tag 'error' with as value 0 it went flawless  $success = 0;	  if(strpos($content, '>0')>0) $success = 1;   curl_close ($ch);  unset($ch);  return $success;  }  

Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :

  for($t=0;$t0) {  $store_trackbacks[]=$trackbacks[$t];  }  }  cache_xml_store($store_trackbacks, $pagetitle);  

voila : a page with only successful trackbacks.

Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.

0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.

*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :

  /**  * Fetches ping url from the given url  *  * @param	string	$url	URL to probe for RDF  * @return	string	Ping URL  */  protected function getPingURL($url) {  $pingUrl = '';  // Get URL content  $urlContent = t3lib_div::getURL($url);  if ($urlContent && ($rdfPos = strpos($urlContent, '', $rdfPos)) !== false) {  // We will use quick regular expression to find ping URL  $rdfContent = substr($urlContent, $rdfPos, $endPos);  $pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);  }  }  return $pingUrl;  }  

4 thoughts on “curl trackbacks”

  1. I get about 10% backlinks when I spider, more than I expected as the site doesn’t have any content of itself. Buying links or using tnx is more effective, as background process this works better. I want a stand alone script with an xml cache so I can use it on other sites without being dependant on script capabilities.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top