Categories
links php seo tips and tricks

curl trackbacks

I figure i’d blog a post on trackback linkbuilding. A trackback is … (post a few and you’ll get it). The trackback protocol isn’t that interesting, but the implementation of it by blog-platforms and cms’es makes it an excellent means for network development, because it uses a simple http-post. cUrl makes that easy).

To post a succesful link proposal I need some basic data :

about my page

  • url (must exist)
  • blog owner (free)
  • blog name (free)

about the other page

  • url (must exist)
  • excerpt (should be proper normal text)

my page : this is preferably a php routine that hacks some text, pictures and video’s, PLR or articles together, with a url rewrite. I prefer using xml textfiles in stead of a database, works faster when you set stuff up.

other page : don’t use “I liked your article so much…”, use text that maches text on target pages, preferably get some proper excerpts from xml-feeds like blogsearch, msn and yahoo (excerpts contain the keywords I searched for, as anchor text it works better for search engine visibility and link value).

Let’s get some stuff from the MSN rss feed :

//a generic query = 5% success
//add "(powered by) wordpress" 
      $query=urlencode('keywords+wordpress+trackback');
      $xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=50&first=1&format=rss");
      $count=0;
      foreach($xml->channel->item as $i) {

           $count++;

//the data from msn
           $target['link'] = (string) $i->link;
           $target['title'] = (string) $i->title;
           $target['excerpt'] = (string) $i->description;

//some variables I'll need later on
           $target[id'] = $count;
           $target['trackback'] = '';
           $target['trackback_success'] = 0;

           $trackbacks[]=$target;
       }

25% of the cms sites in the top of the search engines are WordPress scripts and WordPress always uses /trackback/ in the rdf-url. I get the source of the urls in the search-feed and grab all link-url’s in it, if any contains /trackback/, I post a trackback to that url and see if it sticks.

(I can also spider all links and check if there is an rdf-segment in the target’s source (*1), but that takes a lot of time, I could also program a curl array and use multicurl, for my purposes this works fast enough).

for($t=0;$t]*?href[\s]?=[\s\"\']+".
           "(.*?)[\"\']+.*?>"."([^< ]+|.*?)?<\/a>/",
        $content, &$matches);
	$uri_array = $matches[1];
	foreach($uri_array as $key => $link) { 
             if(strpos($link, 'rackbac')>0) { 
                $trackbacks[$t]['trackback'] = $link;
                break; 
             }
        }
}

When I fire a trackback, the other script will try and assert if my page has a link and matching text. I have to make sure my page shows the excerpts and links, so I stuff all candidates in a cached xml file.

function cache_xml_store($trackbacks, $pagetitle) 
{
	$xml = '< ?xml version="1.0" encoding="UTF-8"?>
	';
	for($a=0;$a';
		$xml .= ''.$arr['excerpt'].'';
		$xml .= ''.$arr['link'].'';
		$xml .= ''.$arr['title'].'';
		$xml .= '';
	}
	$xml .= '';
	
	$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
	if(file_exists($fname)) unlink('cache/'.$fname);
	$fhandle = fopen($fname, 'w');
	fwrite($fhandle, $xml);
	fclose($fhandle);
	return;
}

I use simplexml to read that cached file and show the excertps and links once the page is requested.

// retrieve the cached xml and return it as array.
function cache_xml_retrieve($pagetitle)
{
	$fname = 'cache/trackback'.urlencode($pagetitle).'.xml';
	if(file_exists($fname)) {
		$xml=@simplexml_load_file($fname);
		if(!$xml) return false;
		foreach($xml->entry as $e) {
			$trackback['id'] =(string) $e->id;
			$trackback['link'] =  rid((string) $e->link);
			$trackback['title'] =  (string) $e->title;
			$trackback['description'] =  (string) $e->description;

			$trackbacks[] = $arr;
		}
		return $trackbacks;
	} 
	return false;
}

(this setup requires a subdirectory cache set to read/write with chmod 777)

I use http://www.domain.com/financial+trends.html and extract the pagetitle as “financial trends’, which has an xml-file http://www.domain.com/cache/financial+trends.xml. (In my own script I use sef urls with mod_rewrite, you can also use the $_SERVER array).

$pagetitle=preg_replace('/\+/', ' ', htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8"));

$cached_excerpts = cache_xml_retrieve($pagetitle);

//do some stuff with, make it look nice  :
for($s=0;$s'.$cached_excerpts['title'].'';
}

Now I prepare the data for the trackback post :

for($t=0;$t "url of my page with the link to the target",
 	"title" => "title of my page",
	"blog_name" => "name of my blog",
	"excerpt" => '[...]'.trim(substr($trackbacks[$t]['description'], 0, 150).'[...]'
        );
        //...and try the trackback
        $trackbacks[$t]['trackback_success'] = trackback_ping($trackback_url, $mytrackbackdata);
    }
}

This the actual trackback post using cUrl. cUrl has a convenient timeout setting, I use three seconds. If a host does not respond in half a second it’s probably dead. Three seconds is generous.

function trackback_ping($trackback_url, $trackback)
	{

//make a string of the data array to post
	foreach($trackback as $key=>$value) $strout[]=$key."=".rawurlencode($value);
        $postfields= implode('&', $strout);
		
//create a curl instance
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $trackback_url);
	curl_setopt($ch, CURLOPT_TIMEOUT, 3);
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

//set a custom form header
	curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));

	curl_setopt($ch, CURLOPT_NOBODY, true);

        curl_setopt($ch, CURLOPT_POST, true);
	curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);	
		
	$content = curl_exec($ch);

//if the return has a tag 'error' with as value 0 it went flawless
	$success = 0;	
	if(strpos($content, '>0')>0) $success = 1; 
	curl_close ($ch);
	unset($ch);
	return $success;
	}

Now the last routine : rewrite the cached xml file with only the successful trackbacks (seo stuff) :

for($t=0;$t0) {
        $store_trackbacks[]=$trackbacks[$t];
    }
}
cache_xml_store($store_trackbacks, $pagetitle);

voila : a page with only successful trackbacks.

Google (the backrub engine) don’t like sites that use automated link-building methods, other engines (Baidu, MSN, Yahoo) use a more normal link popularity keyword matching algorithm. Trackback linking helps getting you a clear engine profile at relative low cost.

0) for brevity and clarity, the code above is rewritten (taken from a trackback script I am developing on another site), it can contain some typo’s.

*1) If you want to spider links for rdf-segments : TYPO3v4 have some code for easy retrieval of trackback-uri’s :

/**
	 * Fetches ping url from the given url
	 *
	 * @param	string	$url	URL to probe for RDF
	 * @return	string	Ping URL
	 */
	protected function getPingURL($url) {
		$pingUrl = '';
		// Get URL content
		$urlContent = t3lib_div::getURL($url);
		if ($urlContent && ($rdfPos = strpos($urlContent, '', $rdfPos)) !== false) {
				// We will use quick regular expression to find ping URL
				$rdfContent = substr($urlContent, $rdfPos, $endPos);
				$pingUrl = preg_replace('/trackback:ping="([^"]+)"/', '\1', $rdfContent);
			}
		}
		return $pingUrl;
	}
Categories
optimization seo tips and tricks

synonymizer with api

If you want to put some old content on the net and have it indexed as fresh unique content, this works wonders for seo-friendly backlinks : the automated synonymizer. I want one that makes my content unique without having to type one character.

Lucky for me, mister John Watson’s synonym database comes with a free 10.000 request a day API and boy is it sweet!

API Requests are straightforward :
http://words.bighugelabs.com/api/2/[apikey]/[keyword]/xml

A number of return formats are supported but xml is easiest, either for parsing with simplexml or regular pattern matching.

It returns on request :
black (slightly shortened)
an xml file like :
<words>
<w p=”adjective” r=”syn”>bleak</w>
<w p=”adjective” r=”syn”>sinister</w>
<w p=”adjective” r=”sim”>dark</w>
<w p=”adjective” r=”sim”>angry</w>
<w p=”noun” r=”syn”>blackness</w>
<w p=”noun” r=”syn”>inkiness</w>
<w p=”verb” r=”syn”>blacken</w>
<w p=”verb” r=”syn”>melanize</w>
</words>

…which is easiest handled with preg_match_all :

function getsynonyms($keyword) {
        $pick = array(); 
	$apikey = 'get your own key';
	$xml=file_get_contents('http://words.bighugelabs.com/api/2/'.$apikey.'/'.$keyword.'/xml');

	if(!$xml) return $pick; //return empty array

	preg_match_all('/(.*?)< \/w>/', $xml, $adj_syns);
	//preg_match_all('/(.*?)< \/w>/', $xml, $adj_sims);
	//preg_match_all('/(.*?)< \/w>/', $xml, $noun_syns);
	//preg_match_all('/(.*?)< \/w>/', $xml, $verb_syns);

	foreach($adj_syns[0] as $adj_syn) $pick[]=$adj_syn;
        //same for verb/noun synonyms, I just want adjectives

	return $pick;
}

practically applying it,
I take a slab of stale old content and…

  • strip tags
  • do a regular match on all alphanumeric sequences dropping other stuff
  • trim the resulting array elements
  • (merge all blog tags, categories, and a list of common words)
  • excluding common terms from the array with text elements
  • excluding words smaller than N characters
  • set a percentage words to be synonimized
  • attempt to retrieve synonyms for remaining terms
  • replace these words in the original text, keep count
  • when I reach the target replacement percentage, abort
  • return (hopefully) a revived text
function synonymize($origtext) {

//make a copy of the original text to dissect
	$content=$origtext;
	//content = $this->body;
	
	$perc=3;			//target percentage changed terms
	$minlength=4;		//minimum length candidates
	$maxrequests=80;	//max use of api-requests


	//dump tags	
	$content =  strip_tags($content);
	
	//dump non-alphanumeric	string characters
	$content = preg_replace('/[^A-Za-z0-9\-]/', ' ', $content);
	
	//explode on blank space
	$wrds = explode(' ', strtolower($content));
	
	//trim off blank spaces just in case
	for($w=0;$w$minlength) {
			//words_select contains candidates for synonyms
				$words_select[] = trim($words_unique[$i]);
			}
		}
	}
	
	//terms that can be changed
	$max = count($words_select);
	
	//no more requests than max
	if($max>$maxrequests) $max=$maxrequests;
	
	for($i=0;$i< $max;$i++) {
	//get synonyms, give server some time
		usleep(100000);
		//retrieve synonyms etc.
		$these_words = getsynonyms($words_select[$i]);
		$jmax=count($these_words);
		if($jmax<1) {
		//no results
		} else {
$count=0;
			$j=0;
//the replacements are done in the original text
			$origtext= preg_replace('/'.$words_select[$i].'/i', $these_words[$j], $origtext, -1, $count);
			$total_switched+=$count;

		} //have we reached the percentage ? 
		if($total_switched>=$toswitch) break;
	}
	//okay!
	return $origtext;
}

function getsynonyms($keyword) {
	$pick=array	();
	$apikey = 'get your own key at bighugelabs.com';
	$xml=@file_get_contents('http://words.bighugelabs.com/api/2/'.$apikey.'/'.urlencode($keyword).'/xml');
	if(!$xml) return $pick;
	preg_match_all('/(.*?)< \/w>/', $xml, $adj_syns);
	foreach($adj_syns[0] as $adj_syn) $pick[]=$adj_syn;
	return $pick;
}

Nothing fancy, a straightforward search-replace routine. A 1200 word text has about 150 candidates and for 3% synonyms I need to replace 36 words, it can do that. If I were to use it for real I would build a table with non-returning terms, and store often used terms, that would speed up the synonimizing, allow the use of preferences and take a load of the api use.

Categories
seo tips and tricks wordpress

How to grab keywords from 7search

“Seo tips and tricks” was not due til November, but this one just popped up today. I was looking for a tool to build a rapid keyword set for a blog, without doing extensive keyword research. The blackhat ‘scraper’ scripts I found come up with ‘michigan seo’ far too many times :)

How to grab keyword sets from 7Search

I want a set of keywords as blog categories to write a blog that contains material with the most popular keywords covering the whole active search pattern set. A nice tool for that is 7Search‘s keyword tool.

It has a captcha protection, you have to answer it once and then you can query as much as you like, it shows last months top 100 search patterns with that keyword and the search volumes :

seo
1,991,112 $0.34 $0.33 $0.21 $0.09 $0.08
seo web design 8,085 $0.07 $0.02 $0.01
seo tool 2,647 $0.05 $0.02 $0.02 $0.01

As I am extremely lazy and hate typing data, I’ll make a quick script to cut and paste that list and have it magically transformed in a wordpress blog category list.

It turned out to be a simple one page program : I do a query on a keyword, select the result table area of the 7Search page (with the mouse : ) and paste it as text into my own form’s textarea, add the main key, and post it.

From the $_POST array, I take the textarea input and explode it on linebreaks. To get the keywords, I check for the first occurrence of 0-9, take the part that comes before it, and have the keywords.

In this function I test for the first 0-9. Had I stopped at the first number and started at 0, I would get thrown out of the loop if there is any 0 in the line (or 1, 2, 3…), regardless of there being any number before the first 0 :

$pos = strpos($linesarr[$x], $i);
if($pos >0) {
    if($pos< $minpos) { 
        $mykeys = substr($linesarr[$x],0,$minpos);
        echo $mykeys;
        break; 
 }}}  

So I test for the first 0 and store the position in minpos, then test for the first 1, 2, 3..., if it comes before the first 0, minpos is set to the lowest position.

$lines=$_POST['textarea'];
$linesarr = explode("\r\n", $lines);

for ($x=0;$x0) {
            if($pos< $minpos) $minpos=$pos; 
        }
    }  

//is minpos smaller than the length of the line ? then its valid data
        if($minpos

That way I always get the first number in the line and the part before it is the whole keyword text.

I also want the search volumes, which is the first full string after the keywords up till the first $-dollar sign. The minpos counter is already at the start digit of the volume. I can get the position of the first dollar sign, and trim off the blanks.

//volume is the is at the start of the string after minpos
	$volstr = trim(substr($linesarr[$x], $minpos));
//and before the first dollar sign
	$volcut = strpos($volstr, "$");
//it contains "," : 9,111,222 so filter out the nonsense for mysql :
	$vol = preg_replace('/,/', '', trim(substr($volstr, 0, $volcut))); 

	if($minpos

[After this I stuff the data in a mysql table `sevencats`].

How to add a keyword list to wordpress as categories

Let's add the keywords to a wordpress blog as categories. WordPress has a very simple function for it wp_insert_term in the taxonomy.php file.

In wpmu you do first have to pick the target blog, as you work on a blogs tableset, wp1_, wp2_ etcetera and if you start it up you get the admin users main blog as active tableset. If you want to add data like categories in another blogs taxonomy table you have to switch to that table set first.

function connect_data() {
		$DB_USER =  "";
		$DB_PASSWORD = "";
		$DB_HOST = "";
		$DB_DATA = "";
		$link =  mysql_connect($DB_HOST, $DB_USER, $DB_PASSWORD) or $error = mysql_error();
		if (!$link) {
	    	return $error; 
		} 	 
        mysql_select_db($DB_DATA, $link) or $error = mysql_error();
		return $link;
	}

//link
$cats=connect_data();

//get array with categories 
$categories=array();

$qry="SELECT cat FROM `sevencats`";
$lst=mysql_query($qry, $cats) or die('list error '.mysql_error());
while($row=mysql_fetch_assoc($lst)) {
	$categories[]=$row['cat'];
}
//close db connection
mysql_close($cats);
	
//open wordpress connection
include_once('wp-config.php');
include_once('wp-includes/wp-db.php');
include_once('wp-includes/taxonomy.php');

//select target blog by id
switch_to_blog(3);

//insert categories
for ($i=0;$i

For a normal wordpress install you'd not have to switch blogs :

//open wordpress connection
include_once('wp-config.php');
include_once('wp-includes/wp-db.php');
include_once('wp-includes/taxonomy.php');

//insert categories
for ($i=0;$i

That gets me the top 100 searches of last month as categories for my new blog all. You can fiddle with it a bit and only pick searches with a volume above 2000 monthly searches (just in case you want to go scraping and only want material that gets you in the serp pages for the high volume search terms).

Next edition : Red Hat Seo (with jingle bells) the Christmas Special :)