curl trackbacks

I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).

To post a succesful link proposal I need some basic data :

about my page

  • url (must exist)
  • blog owner (free)
  • blog name (free)

about the other page

  • url (must exist)
  • excerpt (should be proper normal text)

my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.

other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).

Let’s get some stuff from the MSN rss feed :

//a generic query = 5% success
//add "(powered by) wordpress" 
      $query=urlencode('keywords+wordpress+trackback');
      $xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=50&first=1&format=rss");
      $count=0;
      foreach($xml->channel->item as $i) {

           $count++;

//the data from msn
           $target['link'] = (string) $i->link;
           $target['title'] = (string) $i->title;
           $target['excerpt'] = (string) $i->description;

//some variables I'll need later on
           $target[id'] = $count;
           $target['trackback'] = '';
           $target['trackback_success'] = 0;

           $trackbacks[]=$target;
       }

25% of the cms sites in the top of the search engines are WordPress scripts and WordPress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.

(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).

for($t=0;$t]*?href[\s]?=[\s\"\']+".
           "(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",
        $content, &$matches);
	$uri_array = $matches[1];
	foreach($uri_array as $key => $link) { 
             if(strpos($link, 'rackbac')>0) { 
                $trackbacks[$t]['trackback'] = $link;
                break; 
             }
        }
}

When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.

function cache_xml_store($trackbacks, $pagetitle) 
{
	$xml = '< ?xml version="1.0" encoding="UTF-8"?>
	';
	for($a=0;$a';
		$xml .= ''.$arr['excerpt'].'';
		$xml .= ''.$arr['link'].'';
		$xml .= ''.$arr['title'].'';
		$xml .= '';
	}
	$xml .= '';
	
	$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
	if(file_exists($fname)) unlink('cache/'.$fname);
	$fhandle = fopen($fname, 'w');
	fwrite($fhandle, $xml);
	fclose($fhandle);
	return;
}

I use simplexml to read that cached file and show the excertps and links once the page is requested.

// retrieve the cached xml and return it as array.
function cache_xml_retrieve($pagetitle)
{
	$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
	if(file_exists($fname)) {
		$xml=@simplexml_load_file($fname);
		if(!$xml) return false;
		foreach($xml->entry as $e) {
			$trackback['id'] =(string) $e->id;
			$trackback['link'] =  rid((string) $e->link);
			$trackback['title'] =  (string) $e->title;
			$trackback['description'] =  (string) $e->description;

			$trackbacks[] = $arr;
		}
		return $trackbacks;
	} 
	return false;
}

(this setup requires a subdirectory cache set to read/write with chmod 777)

I use http://www.domain.com/financial+trends.html and extract the pagetitle as “financial trends’, which has an xml-file http://www.domain.com/cache/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).

$pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));

$cached_excerpts = cache_xml_retrieve($pagetitle);

//do some stuff with, make it look nice  :
for($s=0;$s'.$cached_excerpts['title'].'';
}

Now I prepare the data for the trackback post :

for($t=0;$t "url of my page with the link to the target",
 	"title" => "title of my page",
	"blog_name" => "name of my blog",
	"excerpt" => '[...]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[...]'
        );
        //...and try the trackback
        $trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);
    }
}

This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.

function trackback_ping($trackback_url, $trackback)
	{

//make a string of the data array to post
	foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);
        $postfields= implode('&', $strout);
		
//create a curl instance
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $trackback_url);
	curl_setopt($ch, CURLOPT_TIMEOUT, 3);
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

//set a custom form header
	curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));

	curl_setopt($ch, CURLOPT_NOBODY, true);

        curl_setopt($ch, CURLOPT_POST, true);
	curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);	
		
	$content = curl_exec($ch);

//if the return has a tag 'error' with as value 0 it went flawless
	$success = 0;	
	if(strpos($content, '>0')>0) $success = 1; 
	curl_close ($ch);
	unset($ch);
	return $success;
	}

Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :

for($t=0;$t0) {
        $store_trackbacks[]=$trackbacks[$t];
    }
}
cache_xml_store($store_trackbacks, $pagetitle);

voila : a page with only successful trackbacks.

Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.

0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.

*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :

/**
	 * Fetches ping url from the given url
	 *
	 * @param	string	$url	URL to probe for RDF
	 * @return	string	Ping URL
	 */
	protected function getPingURL($url) {
		$pingUrl = '';
		// Get URL content
		$urlContent = t3lib_div::getURL($url);
		if ($urlContent && ($rdfPos = strpos($urlContent, '', $rdfPos)) !== false) {
				// We will use quick regular expression to find ping URL
				$rdfContent = substr($urlContent, $rdfPos, $endPos);
				$pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);
			}
		}
		return $pingUrl;
	}

using ajax readystate 3 polling

This one is not news anymore, but anyway, a friend of mine asked about a function to notify the browser of changes on the backend, so with multiple online users you can notify a user of changes others make.

One way is a socket daemon, the other is using the ajax redystate 3 ‘polling’ feature (a bit like the comet server). As the ajax xhr-object is a basic http-wrapper, it has the same sequence of a normal http connection. A normal call is in readystate 3 (receiving data) until the server signals that was the end of it, readystate 4, where you can pick up the returned http-connection status (200, 404 etc.)

Using the php flush() command inside a running program, you force output to the browser, which triggers a readystate 3 change in an xhr instance. You can pick up on the triggered readystate change and read the new output in the buffer.

A demo is basic, and requires four files

  • queue.txt  (chmod 777)
  • polling.js
  • polling.php
  • polling.html

I put one queue.txt file on the backend with 777 permission so anyone can read and write it.

Then I make a javascript file containing two calls, startclock and stopclock (and makeXmlHttp() to make an xhr instance). Startclock starts an endless loop and outputs the incremental content of the output buffer to a div in the html file (for the demo I echo time(), that way you can make an ajax digital clock) :

function startclock()
{
        var index = 0;            
        xmlHttp = makeXmlHttp();
        
        xmlHttp.onreadystatechange = function()
        {
                if ( xmlHttp.readyState == 3 )
                {
//grab the new part of the output buffer and write it to a div
				var rtlen = xmlHttp.responseText.length;
			        if (index < rtlen) {
			           document.getElementById("seoresult").innerHTML =  xmlHttp.responseText.substring(index);
			           index = rtlen;
			        }
			}
        }
        xmlHttp.open("POST", "polling.php?action=start", true);
        xmlHttp.send('');
}

stopclock() just calls a php function that writes 'stop' in queue.txt :

function stopclock()
{
        xmlHttp = makeXmlHttp();
        xmlHttp.open("POST", "polling.php?action=stop", true);
        xmlHttp.send('');
}

For the sake of the demo, I added a function stopclock() that writes ‘stop’ to the queue.txt.

Then the polling.php program file : this contains a routine that runs an endless loop and three routines for the queue.txt file (write ‘start’, add ‘stop’, and read content). The endless loop reads queue.txt once every half second, if the word ‘stop’ is in there it ends, the php-program ends and the xhr call ends. Otherwise the endlessloop function outputs the time, and flushes the buffer to the browser :

if($_GET['action']=='start') {
	endlessloop();
} else {
        writestop();
}


function endlessloop() {
//truncate the queue, write 'start'
	writestart();

//get the time
	$start=time();

//using while(1) or while(true) you start an endless loop,
//and use break to end it, I tend to also use a timed end,
//to prevent the program from running on endlessly on the
//server if I break the http connection

	while(1) {
//read the file contents
		$the_Text=readsome();

//check if the word 'stop' is in there
//if so, echo a notification, end the program
		if(strpos($the_Text, "stop")>0) { 
			echo 'clock stopped';
			flush();			
			break;
		}

//after 45 seconds (arbitrary) end the program anyway
		if(time()>($start+45)) {
			echo 'time elapsed';
			flush();			
			break;
		}

//echo the time
		echo time();

//wait for a while
		usleep(100000);

//flush triggers a forced dump of the buffer to the browser
		flush();
	}
}

function writestart() {
//truncate the file, write 'start'
	$fhandle =fopen('queue.txt', 'w');
	fwrite($fhandle, 'start');
	fclose($fhandle);
}
function writestop() {
//write 'stop' 
	$fhandle =fopen('queue.txt', 'a');
	fwrite($fhandle, 'stop');
	fclose($fhandle);
}

function readsome() {
//read the file, return the text contents
	$fhandle =fopen('queue.txt', 'r');
	while($buffer = fread($fhandle, 1024)) {
		$text.=$buffer;
	}
	return $text;	
}

If you start the same polling.html in two browser windows you’ll notice that stopping one, also causes the other to stop. Very basic demo.

hands on xml-rpc : copying msql tables

I don’t have anything to blog on, so I will bore you all with a quick generic function to copy mysql tables from one host to another, using xml-rpc.

I use the Incutio xml-rpc library on both hosts, to handle the tedious stuff (xml formatting and parsing). That leaves only some snippets to send and receive table data and store it on a mysql database.

First : how to handle the table data on the sending end:

  • I take an associative array from a mysql query
  • I make an array to hold the records
  • I add each row as array
  • I make an IXR-client.
  • I add some general parameters
  • I hand these and the entire table array to my IXR-client.
  • send…
//the snippet with the client is at the bottom of the post
$ThisClient = New SerpClient('http://serp.trismegistos.net/db/xmlrpc.php', 'user', 'pass', 'sender');

$tablename = "serp_tags_keys";
$tableid = "id";
$result = $serpdb->query("SELECT * FROM ".$tablename);
$recordcount = mysql_num_rows($result);

while($row=mysql_fetch_assoc($result)) {
	$record=array();
	foreach($row as $key => $value) $record[$key]=$value;
	$records[]=$record;
}

$ThisClient->putTable($tablename, $recordcount, $tableid, $records);

I consider some additional fields necessary for basic integrity checks : I add “ID” as key field, so on the receiving end the server knows which field is my table’s auto-increment field. Other fields are a username, password, tablename and the batch recordcount.

The IXR_Client then generates a tangled mess of xml-tags holding the entire prodecure call and data. (you can put the client on ‘debug’, then it dumps the generated xml to the screen).

The first part of the xml file contains the single parameters :

  • username
  • password
  • tablename
  • recordcount
  • id-field

<methodCall>
<methodName>serp.putTable</methodName>
<params>
<param><value><string>user</string></value></param>
<param><value><string>pass</string></value></param>
<param><value><string>serp_tags_keys</string></value></param>
<param><value><int>91</int></value></param>
<param><value><string>id</string></value></param>

Then the entire table is sent as one parameter in the procedure call.

That parameter is built from an array containing the table rows as ‘struct’. If I want to use the routine for any table, I need the fieldname-value pairs to compose a standard mysql insert statement. A struct type allows me to use key-value pairs in the xml-file that can be parsed back into an array.

<param><value><array>

<data>

<value><struct>
<member><name>id</name><value><string>4</string></value></member>
<member><name>tag</name><value><string>ranking</string></value></member>
<member><name>cat</name><value><string>alexa ranking seo internet ranking internet positi</string></value></member>
<member><name>date</name><value><string>200901</string></value></member>
</struct></value>

<value><struct>
<member><name>id</name><value><string>94</string></value></member>
<member><name>tag</name><value><string>firm</string></value></member>
<member><name>cat</name><value><string>firm seo</string></value></member>
<member><name>date</name><value><string>200901</string></value></member>
</struct></value>

</data>

</array></value></param>

That was the last of the param holding the table, so the entire tag-mess is closed :

</params&gt</methodCall&gt

Then the second part : on the receiving end the Incutio class parses the whole tag-mess, and hands an array of the param sections as input to my function putTable.

	function putTable($args) 
	{
		$user 	 = $args[0];
		$pass 	 = $args[1];
		$tname 	 = $args[2];
		$tcount	 = $args[3];
		$id 	         = $args[4];	
		$table 	 = $args[5];

$table is a straightforward array holding as items an array ($t) created from the struct with the pairs of fieldname-value. I turn the recordsets key-value struct into a mysql INSERT query :
$query = “INSERT INTO `”.$tname.”` (” field, field… “) VALUES (” fieldvalue, fieldvalue “)”;

All I have to do is add the fieldnames and fieldvalues to the mysql insert query.

		foreach($table as $t) {

//the fixed parts
				$query0 = 'INSERT INTO `'.$tname.'` (';
				$query2 .=") VALUES (";

//make the (`fieldname`, `fieldname`, `fieldname`) query-bit 
//and the ('fieldvalue', 'fieldvalue', 'fieldvalue') query-bit :

				foreach($t as $key=>$value) {
					if($key!=$id) {	
						$query1 .="`".$key."`, ";
						$query3 .="'".$value."', ";
					}
				}

//remove the trailing ", "
				$query1=substr($query1, 0, strlen($query1)-2);
				$query3=substr($query3, 0, strlen($query3)-2);

//glue em up and add the final ")"
				$query0 .= $query1.$query2.$query3.")";

//query...
				$this->connection->query($query0);

//reset the strings
				$query0='';
				$query1='';
				$query2='';
				$query3='';
			}	
	}

that generates mysql queries like
INSERT INTO `serp_tags_keys` (`tag`, `cat`, `date`) VALUES (‘ranking’, ‘alexa ranking’, ‘200901’) and copies the entire table.

That is how I handle the table data.

Of course I have to define two custom classes to process the serp.putTable procedure itself, using the Incutio class.

First the class for the sending script, which is pretty straight forward :

  • make an IXR_Client instance
  • hand the record set to it
  • have it formatted and sent
//include the library
include('class-IXR.php');

//make a custom class that uses the IXR_client
Class SerpClient 
{
	var $rpcurl;         //endpoint
	var $username;   //you go figure
	var $password;
	var $bClient;      //incutio ixr-client instance
	var $myclient;  //machine/host-id
	
	   function SerpClient($rpcurl, $username, $password, $myclient)
    {
	$this->rpcurl	= $rpcurl;
    if (!$this->connect()) return false; 

    	//Standard variables to send in the message
	$this->rpcurl	= (string) $rpcurl;
    	$this->username = (string) $username;
    	$this->password = (string) $password;
	$this->myclient = (string) $myclient;
    	return $this;
    }
	
   		function connect() 
   {
//basic client, it takes the endpoint url, tests and returns true if it exists
    	if($this->bClient = new IXR_Client($this->rpcurl)) return true;
    }
	
//the function I use to send the data
		function putTable($tablename, $recordcount, $tableid, $array) 
	{
//first parameter is always the methodname, then the parameters, which are
//added sequential to the xml-file (with the appropriate tags for datatypes.
//the script figures that out. note : it uses htmlentities on strings.
		$this->bClient->query('serp.putTable', $this->username, $this->password, $tablename, $recordcount, $tableid, $array);
	}

}

I use it in the snippets above with :

$ThisClient = New SerpClient('http://serp.trismegistos.net/db/xmlrpc.php', 'user', 'pass', 'sender');
//...
$ThisClient->putTable($tname, $tcount, $tableid, $records);

Then, on the receiving end, my program has to know how to handle the xml containing the remote procedure call.

I define an extension on IXR_server and pass serp.putTable as new ‘method’ (callback function).

//go away cookie...
$_COOKIE = array();

//make sure you get the posted crap, the ixr instances grabs it input from it
if ( !isset( $HTTP_RAW_POST_DATA ) ) $HTTP_RAW_POST_DATA = file_get_contents( 'php://input' );
if ( isset($HTTP_RAW_POST_DATA) ) $HTTP_RAW_POST_DATA = trim($HTTP_RAW_POST_DATA);

//include the library
include('class-IXR.php');

//make an extended class
class serp_xmlrpc_server extends IXR_Server {

//use the same function name...

	function serp_xmlrpc_server() {

//build an array of methods : 
//first the procedurename you use in the xml-text,
//then which function in the extended class (this one) it maps to 
//to be used as $this->method

		$this->methods = array('serp.putTable'	 => 'this:putTable');

//hand em to the IXR server instance that will map it as callback
		$this->IXR_Server($this->methods);
	}

//now IXR_Server instance uses ($this->)putTable 
//to process incoming xml-text 
//containing serp.putTable as methodname

		function putTable($args) 
	{
//(for routine : see the snippet above to store the xml data in mysql)
	}
}

//make the class instance like any regular get-post php program, 
//the only actual program line, that instantiates the extended class,
//which handles the posted xml 

$serp_xmlrpc_server = new serp_xmlrpc_server();

That’s all. I am not going to list a cut-and-paste version. You have to build some stuff with it, then you will come up with lots of stuff you can do with it.

WordPress and iPhone built a plugin that receives pictures from iPhone. WordPress uses Incutio so you can ‘piggyback’ on that and have an iPhone plugin for your own website in two days flat using an ajax lightbox gallery script. Or go monetize small websites with some seo oriented ‘optimisation’ functions like ChangeFooterLinks(array($paidurl, $anchortext)) :) or whatever… boring, isn’t it ?