curl trackbacks
juust | 25/03/2009I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).
To post a succesful link proposal I need some basic data :
about my page
- url (must exist)
- blog owner (free)
- blog name (free)
about the other page
- url (must exist)
- excerpt (should be proper normal text)
my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.
other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).
Let’s get some stuff from the MSN rss feed :
-
//a generic query = 5% success
-
//add "(powered by) wordpress"
-
$query=urlencode('keywords+wordpress+trackback');
-
$xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=50&first=1&format=rss");
-
$count=0;
-
foreach($xml->channel->item as $i) {
-
-
$count++;
-
-
//the data from msn
-
$target['link'] = (string) $i->link;
-
$target['title'] = (string) $i->title;
-
$target['excerpt'] = (string) $i->description;
-
-
//some variables I'll need later on
-
$target[id'] = $count;
-
$target['trackback'] = '';
-
$target['trackback_success'] = 0;
-
-
$trackbacks[]=$target;
-
}
-
25% of the cms sites in the top of the search engines are Wordpress scripts and Wordpress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.
(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).
-
for($t=0;$t<count ($trackbacks);$t++) {
-
//I could use curl
-
//but 95% of the urls offered are kosher and respond fast
-
$content = @file_get_contents($trackbacks[$t]['link']);
-
preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+".
-
"(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",
-
$content, &$matches);
-
$uri_array = $matches[1];
-
foreach($uri_array as $key => $link) {
-
if(strpos($link, 'rackbac')>0) {
-
$trackbacks[$t]['trackback'] = $link;
-
break;
-
}
-
}
-
}
-
</count>
When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.
-
function cache_xml_store($trackbacks, $pagetitle)
-
{
-
$xml = '< ?xml version="1.0" encoding="UTF-8"?>
-
<trackbacks>';
-
for($a=0;$a<count ($trackbacks);$a++) {
-
$arr = $trackbacks[$a];
-
$xml .= '<entry>';
-
$xml .= '<id>'.$arr['id'].'</id>';
-
$xml .= '<excerpt>'.$arr['excerpt'].'</excerpt>';
-
$xml .= '<link>'.$arr['link'].'</link>';
-
$xml .= '<title>'.$arr['title'].'</title>';
-
$xml .= '';
-
}
-
$xml .= '</count></trackbacks>';
-
-
$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
-
if(file_exists($fname)) unlink('cache/'.$fname);
-
$fhandle = fopen($fname, 'w');
-
fwrite($fhandle, $xml);
-
fclose($fhandle);
-
return;
-
}
I use simplexml to read that cached file and show the excertps and links once the page is requested.
-
// retrieve the cached xml and return it as array.
-
function cache_xml_retrieve($pagetitle)
-
{
-
$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
-
if(file_exists($fname)) {
-
$xml=@simplexml_load_file($fname);
-
if(!$xml) return false;
-
foreach($xml->entry as $e) {
-
$trackback['id'] =(string) $e->id;
-
$trackback['link'] = rid((string) $e->link);
-
$trackback['title'] = (string) $e->title;
-
$trackback['description'] = (string) $e->description;
-
-
$trackbacks[] = $arr;
-
}
-
return $trackbacks;
-
}
-
return false;
-
}
(this setup requires a subdirectory cache set to read/write with chmod 777)
I use http://www.domain.com/financial+trends.html and extract the pagetitle as “financial trends’, which has an xml-file http://www.domain.com/cache/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).
-
$pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));
-
-
$cached_excerpts = cache_xml_retrieve($pagetitle);
-
-
//do some stuff with, make it look nice :
-
for($s=0;$s<count ($cached_excerpts);$s++) {
-
//this lists the trackback (candidates)
-
echo $cached_excerpts[$s]['excerpt'];
-
echo '<a href="'.$cached_excerpts[$s]['link'].'">'.$cached_excerpts['title'].'';
-
}
-
</count>
Now I prepare the data for the trackback post :
-
for($t=0;$t<count ($trackbacks);$t++) {
-
-
$trackback_url = $trackbacks[$t]['trackback'];
-
//does it have a trackback target url ? then prepare data :
-
if($trackback_url !='') {
-
$trackback_data = array(
-
"url" => "url of my page with the link to the target",
-
"title" => "title of my page",
-
"blog_name" => "name of my blog",
-
"excerpt" => '[...]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[...]'
-
);
-
//…and try the trackback
-
$trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);
-
}
-
}
-
</count>
This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.
-
function trackback_ping($trackback_url, $trackback)
-
{
-
-
//make a string of the data array to post
-
foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);
-
$postfields= implode('&', $strout);
-
-
//create a curl instance
-
$ch = curl_init();
-
curl_setopt($ch, CURLOPT_URL, $trackback_url);
-
curl_setopt($ch, CURLOPT_TIMEOUT, 3);
-
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
-
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
-
-
//set a custom form header
-
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));
-
-
curl_setopt($ch, CURLOPT_NOBODY, true);
-
-
curl_setopt($ch, CURLOPT_POST, true);
-
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
-
-
$content = curl_exec($ch);
-
-
//if the return has a tag 'error' with as value 0 it went flawless
-
$success = 0;
-
if(strpos($content, '>0')>0) $success = 1;
-
curl_close ($ch);
-
unset($ch);
-
return $success;
-
}
Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :
-
for($t=0;$t<count ($trackbacks);$t++) {
-
if($trackbacks[$t]['trackback_success']>0) {
-
$store_trackbacks[]=$trackbacks[$t];
-
}
-
}
-
cache_xml_store($store_trackbacks, $pagetitle);
-
</count>
voila : a page with only successful trackbacks.
Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.
0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.
*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :
-
/**
-
* Fetches ping url from the given url
-
*
-
* @param string $url URL to probe for RDF
-
* @return string Ping URL
-
*/
-
protected function getPingURL($url) {
-
$pingUrl = '';
-
// Get URL content
-
$urlContent = t3lib_div::getURL($url);
-
if ($urlContent && ($rdfPos = strpos($urlContent, '<rdf :RDF')) !== false) {
-
// RDF exists in this content. Get it and parse
-
$urlContent = substr($urlContent, $rdfPos);
-
if (($endPos = strpos($urlContent, '</rdf:RDF>', $rdfPos)) !== false) {
-
// We will use quick regular expression to find ping URL
-
$rdfContent = substr($urlContent, $rdfPos, $endPos);
-
$pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);
-
}
-
}
-
return $pingUrl;
-
}
-
</rdf>








