curl trackbacks

I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).

To post a succesful link proposal I need some basic data :

about my page

  • url (must exist)
  • blog owner (free)
  • blog name (free)

about the other page

  • url (must exist)
  • excerpt (should be proper normal text)

my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.

other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).

Let’s get some stuff from the MSN rss feed :

//a generic query = 5% success
//add "(powered by) wordpress" 
$query=urlencode('keywords+wordpress+trackback');
$xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=50&first=1&format=rss");
$count=0;
foreach($xml->channel->item as $i) {
$count++;
//the data from msn
$target['link'] = (string) $i->link;
$target['title'] = (string) $i->title;
$target['excerpt'] = (string) $i->description;
//some variables I'll need later on
$target[id'] = $count;
$target['trackback'] = '';
$target['trackback_success'] = 0;
$trackbacks[]=$target;
}

25% of the cms sites in the top of the search engines are WordPress scripts and WordPress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.

(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).

for($t=0;$t]*?href[\s]?=[\s\"\']+".
"(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",
$content, &$matches);
$uri_array = $matches[1];
foreach($uri_array as $key => $link) { 
if(strpos($link, 'rackbac')>0) { 
$trackbacks[$t]['trackback'] = $link;
break; 
}
}
}

When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.

function cache_xml_store($trackbacks, $pagetitle) 
{
$xml = '< ?xml version="1.0" encoding="UTF-8"?>
';
for($a=0;$a'.$arr['id'].'';
$xml .= ''.$arr['excerpt'].'';
$xml .= ''.$arr['link'].'';
$xml .= ''.$arr['title'].'';
$xml .= '';
}
$xml .= '';
$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
if(file_exists($fname)) unlink('cache/'.$fname);
$fhandle = fopen($fname, 'w');
fwrite($fhandle, $xml);
fclose($fhandle);
return;
}

I use simplexml to read that cached file and show the excertps and links once the page is requested.

// retrieve the cached xml and return it as array.
function cache_xml_retrieve($pagetitle)
{
$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
if(file_exists($fname)) {
$xml=@simplexml_load_file($fname);
if(!$xml) return false;
foreach($xml->entry as $e) {
$trackback['id'] =(string) $e->id;
$trackback['link'] =  rid((string) $e->link);
$trackback['title'] =  (string) $e->title;
$trackback['description'] =  (string) $e->description;
$trackbacks[] = $arr;
}
return $trackbacks;
} 
return false;
}

(this setup requires a subdirectory cache set to read/write with chmod 777)

I use http://www.domain.com/financial+trends.html and extract the pagetitle as “financial trends’, which has an xml-file http://www.domain.com/cache/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).

$pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));
$cached_excerpts = cache_xml_retrieve($pagetitle);
//do some stuff with, make it look nice  :
for($s=0;$s'.$cached_excerpts['title'].'';
}

Now I prepare the data for the trackback post :

for($t=0;$t "url of my page with the link to the target",
"title" => "title of my page",
"blog_name" => "name of my blog",
"excerpt" => '[...]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[...]'
);
//...and try the trackback
$trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);
}
}

This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.

function trackback_ping($trackback_url, $trackback)
{
//make a string of the data array to post
foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);
$postfields= implode('&', $strout);
//create a curl instance
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $trackback_url);
curl_setopt($ch, CURLOPT_TIMEOUT, 3);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
//set a custom form header
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);	
$content = curl_exec($ch);
//if the return has a tag 'error' with as value 0 it went flawless
$success = 0;	
if(strpos($content, '>0')>0) $success = 1; 
curl_close ($ch);
unset($ch);
return $success;
}

Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :

for($t=0;$t0) {
$store_trackbacks[]=$trackbacks[$t];
}
}
cache_xml_store($store_trackbacks, $pagetitle);

voila : a page with only successful trackbacks.

Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.

0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.

*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :

/**
* Fetches ping url from the given url
*
* @param	string	$url	URL to probe for RDF
* @return	string	Ping URL
*/
protected function getPingURL($url) {
$pingUrl = '';
// Get URL content
$urlContent = t3lib_div::getURL($url);
if ($urlContent && ($rdfPos = strpos($urlContent, '', $rdfPos)) !== false) {
// We will use quick regular expression to find ping URL
$rdfContent = substr($urlContent, $rdfPos, $endPos);
$pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);
}
}
return $pingUrl;
}

4 thoughts on “curl trackbacks”

  1. I get about 10% backlinks when I spider, more than I expected as the site doesn’t have any content of itself. Buying links or using tnx is more effective, as background process this works better. I want a stand alone script with an xml cache so I can use it on other sites without being dependant on script capabilities.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top