curl trackbacks

I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).

To post a succesful link proposal I need some basic data :

about my page

  • url (must exist)
  • blog owner (free)
  • blog name (free)

about the other page

  • url (must exist)
  • excerpt (should be proper normal text)

my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.

other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).

Let’s get some stuff from the MSN rss feed :

//a generic query = 5% success
//add "(powered by) wordpress" 
      $query=urlencode('keywords+wordpress+trackback');
      $xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=50&first=1&format=rss");
      $count=0;
      foreach($xml->channel->item as $i) {

           $count++;

//the data from msn
           $target['link'] = (string) $i->link;
           $target['title'] = (string) $i->title;
           $target['excerpt'] = (string) $i->description;

//some variables I'll need later on
           $target[id'] = $count;
           $target['trackback'] = '';
           $target['trackback_success'] = 0;

           $trackbacks[]=$target;
       }

25% of the cms sites in the top of the search engines are WordPress scripts and WordPress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.

(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).

for($t=0;$t]*?href[\s]?=[\s\"\']+".
           "(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",
        $content, &$matches);
	$uri_array = $matches[1];
	foreach($uri_array as $key => $link) { 
             if(strpos($link, 'rackbac')>0) { 
                $trackbacks[$t]['trackback'] = $link;
                break; 
             }
        }
}

When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.

function cache_xml_store($trackbacks, $pagetitle) 
{
	$xml = '< ?xml version="1.0" encoding="UTF-8"?>
	';
	for($a=0;$a';
		$xml .= ''.$arr['excerpt'].'';
		$xml .= ''.$arr['link'].'';
		$xml .= ''.$arr['title'].'';
		$xml .= '';
	}
	$xml .= '';
	
	$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
	if(file_exists($fname)) unlink('cache/'.$fname);
	$fhandle = fopen($fname, 'w');
	fwrite($fhandle, $xml);
	fclose($fhandle);
	return;
}

I use simplexml to read that cached file and show the excertps and links once the page is requested.

// retrieve the cached xml and return it as array.
function cache_xml_retrieve($pagetitle)
{
	$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
	if(file_exists($fname)) {
		$xml=@simplexml_load_file($fname);
		if(!$xml) return false;
		foreach($xml->entry as $e) {
			$trackback['id'] =(string) $e->id;
			$trackback['link'] =  rid((string) $e->link);
			$trackback['title'] =  (string) $e->title;
			$trackback['description'] =  (string) $e->description;

			$trackbacks[] = $arr;
		}
		return $trackbacks;
	} 
	return false;
}

(this setup requires a subdirectory cache set to read/write with chmod 777)

I use http://www.domain.com/financial+trends.html and extract the pagetitle as “financial trends’, which has an xml-file http://www.domain.com/cache/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).

$pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));

$cached_excerpts = cache_xml_retrieve($pagetitle);

//do some stuff with, make it look nice  :
for($s=0;$s'.$cached_excerpts['title'].'';
}

Now I prepare the data for the trackback post :

for($t=0;$t "url of my page with the link to the target",
 	"title" => "title of my page",
	"blog_name" => "name of my blog",
	"excerpt" => '[...]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[...]'
        );
        //...and try the trackback
        $trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);
    }
}

This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.

function trackback_ping($trackback_url, $trackback)
	{

//make a string of the data array to post
	foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);
        $postfields= implode('&', $strout);
		
//create a curl instance
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $trackback_url);
	curl_setopt($ch, CURLOPT_TIMEOUT, 3);
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

//set a custom form header
	curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));

	curl_setopt($ch, CURLOPT_NOBODY, true);

        curl_setopt($ch, CURLOPT_POST, true);
	curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);	
		
	$content = curl_exec($ch);

//if the return has a tag 'error' with as value 0 it went flawless
	$success = 0;	
	if(strpos($content, '>0')>0) $success = 1; 
	curl_close ($ch);
	unset($ch);
	return $success;
	}

Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :

for($t=0;$t0) {
        $store_trackbacks[]=$trackbacks[$t];
    }
}
cache_xml_store($store_trackbacks, $pagetitle);

voila : a page with only successful trackbacks.

Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.

0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.

*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :

/**
	 * Fetches ping url from the given url
	 *
	 * @param	string	$url	URL to probe for RDF
	 * @return	string	Ping URL
	 */
	protected function getPingURL($url) {
		$pingUrl = '';
		// Get URL content
		$urlContent = t3lib_div::getURL($url);
		if ($urlContent && ($rdfPos = strpos($urlContent, '', $rdfPos)) !== false) {
				// We will use quick regular expression to find ping URL
				$rdfContent = substr($urlContent, $rdfPos, $endPos);
				$pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);
			}
		}
		return $pingUrl;
	}
Posted in links, php, seo tips and tricks and tagged , , , .

4 Comments

  1. I get about 10% backlinks when I spider, more than I expected as the site doesn’t have any content of itself. Buying links or using tnx is more effective, as background process this works better. I want a stand alone script with an xml cache so I can use it on other sites without being dependant on script capabilities.

Leave a Reply

Your email address will not be published. Required fields are marked *