synonymizer with api

If you want to put some old content on the net and have it indexed as fresh unique content, this works wonders for seo-friendly backlinks : the automated synonymizer. I want one that makes my content unique without having to type one character.

Lucky for me, mister John Watson’s synonym database comes with a free 10.000 request a day API and boy is it sweet!

API Requests are straightforward :
http://words.bighugelabs.com/api/2/[apikey]/[keyword]/xml

A number of return formats are supported but xml is easiest, either for parsing with simplexml or regular pattern matching.

It returns on request :
black (slightly shortened)
an xml file like :
<words>
<w p=”adjective” r=”syn”>bleak</w>
<w p=”adjective” r=”syn”>sinister</w>
<w p=”adjective” r=”sim”>dark</w>
<w p=”adjective” r=”sim”>angry</w>
<w p=”noun” r=”syn”>blackness</w>
<w p=”noun” r=”syn”>inkiness</w>
<w p=”verb” r=”syn”>blacken</w>
<w p=”verb” r=”syn”>melanize</w>
</words>

…which is easiest handled with preg_match_all :

function getsynonyms($keyword) {
        $pick = array(); 
	$apikey = 'get your own key';
	$xml=file_get_contents('http://words.bighugelabs.com/api/2/'.$apikey.'/'.$keyword.'/xml');

	if(!$xml) return $pick; //return empty array

	preg_match_all('/(.*?)< \/w>/', $xml, $adj_syns);
	//preg_match_all('/(.*?)< \/w>/', $xml, $adj_sims);
	//preg_match_all('/(.*?)< \/w>/', $xml, $noun_syns);
	//preg_match_all('/(.*?)< \/w>/', $xml, $verb_syns);

	foreach($adj_syns[0] as $adj_syn) $pick[]=$adj_syn;
        //same for verb/noun synonyms, I just want adjectives

	return $pick;
}

practically applying it,
I take a slab of stale old content and…

  • strip tags
  • do a regular match on all alphanumeric sequences dropping other stuff
  • trim the resulting array elements
  • (merge all blog tags, categories, and a list of common words)
  • excluding common terms from the array with text elements
  • excluding words smaller than N characters
  • set a percentage words to be synonimized
  • attempt to retrieve synonyms for remaining terms
  • replace these words in the original text, keep count
  • when I reach the target replacement percentage, abort
  • return (hopefully) a revived text
function synonymize($origtext) {

//make a copy of the original text to dissect
	$content=$origtext;
	//content = $this->body;
	
	$perc=3;			//target percentage changed terms
	$minlength=4;		//minimum length candidates
	$maxrequests=80;	//max use of api-requests


	//dump tags	
	$content =  strip_tags($content);
	
	//dump non-alphanumeric	string characters
	$content = preg_replace('/[^A-Za-z0-9\-]/', ' ', $content);
	
	//explode on blank space
	$wrds = explode(' ', strtolower($content));
	
	//trim off blank spaces just in case
	for($w=0;$w$minlength) {
			//words_select contains candidates for synonyms
				$words_select[] = trim($words_unique[$i]);
			}
		}
	}
	
	//terms that can be changed
	$max = count($words_select);
	
	//no more requests than max
	if($max>$maxrequests) $max=$maxrequests;
	
	for($i=0;$i< $max;$i++) {
	//get synonyms, give server some time
		usleep(100000);
		//retrieve synonyms etc.
		$these_words = getsynonyms($words_select[$i]);
		$jmax=count($these_words);
		if($jmax<1) {
		//no results
		} else {
$count=0;
			$j=0;
//the replacements are done in the original text
			$origtext= preg_replace('/'.$words_select[$i].'/i', $these_words[$j], $origtext, -1, $count);
			$total_switched+=$count;

		} //have we reached the percentage ? 
		if($total_switched>=$toswitch) break;
	}
	//okay!
	return $origtext;
}

function getsynonyms($keyword) {
	$pick=array	();
	$apikey = 'get your own key at bighugelabs.com';
	$xml=@file_get_contents('http://words.bighugelabs.com/api/2/'.$apikey.'/'.urlencode($keyword).'/xml');
	if(!$xml) return $pick;
	preg_match_all('/(.*?)< \/w>/', $xml, $adj_syns);
	foreach($adj_syns[0] as $adj_syn) $pick[]=$adj_syn;
	return $pick;
}

Nothing fancy, a straightforward search-replace routine. A 1200 word text has about 150 candidates and for 3% synonyms I need to replace 36 words, it can do that. If I were to use it for real I would build a table with non-returning terms, and store often used terms, that would speed up the synonimizing, allow the use of preferences and take a load of the api use.

Posted in optimization, seo tips and tricks and tagged , .

One Comment

  1. unique content being key whenever re-using content, i’ve seen that to be sure that you’ll always pass any human ‘spot’ check that might occur, take a part of any article out and/or reformat the paragraph structure. By that I mean spacing/line break or any other formatting tags you may have used.

    I’ll do that once every 5-6 posts typically with pretty good results.

Leave a Reply

Your email address will not be published. Required fields are marked *