juust ~ php oddities

Unordered list of one element
  • rss
  • begin
  • about
    • vcard
    • WTF is BroJesus
  • php scripts
    • flickr wp widget
    • google multi key serp tool, php script
    • gwt plugin
  • php classes
    • php pagerank class
    • fibonacci class
    • robots.txt parser php class
  • serp
    • serp dashboard wordpress plugin
  • services

synonymizer with api

juust | 28/12/2008

If you want to put some old content on the net and have it indexed as fresh unique content, this works wonders for seo-friendly backlinks : the automated synonymizer. I want one that makes my content unique without having to type one character.

Lucky for me, mister John Watson’s synonym database comes with a free 10.000 request a day API and boy is it sweet!

API Requests are straightforward :
http://words.bighugelabs.com/api/2/[apikey]/[keyword]/xml

A number of return formats are supported but xml is easiest, either for parsing with simplexml or regular pattern matching.

It returns on request :
black (slightly shortened)
an xml file like :
<words>
<w p=”adjective” r=”syn”>bleak</w>
<w p=”adjective” r=”syn”>sinister</w>
<w p=”adjective” r=”sim”>dark</w>
<w p=”adjective” r=”sim”>angry</w>
<w p=”noun” r=”syn”>blackness</w>
<w p=”noun” r=”syn”>inkiness</w>
<w p=”verb” r=”syn”>blacken</w>
<w p=”verb” r=”syn”>melanize</w>
</words>

…which is easiest handled with preg_match_all :

  1. function getsynonyms($keyword) {
  2.         $pick = array();
  3.  $apikey = 'get your own key';
  4.  $xml=file_get_contents('http://words.bighugelabs.com/api/2/'.$apikey.'/'.$keyword.'/xml');
  5.  
  6.  if(!$xml) return $pick; //return empty array
  7.  
  8.  preg_match_all('/<w p="adjective" r="syn">(.*?)< \/w>/', $xml, $adj_syns);
  9.  //preg_match_all('/</w><w p="adjective" r="sim">(.*?)< \/w>/', $xml, $adj_sims);
  10.  //preg_match_all('/</w><w p="noun" r="syn">(.*?)< \/w>/', $xml, $noun_syns);
  11.  //preg_match_all('/</w><w p="verb" r="syn">(.*?)< \/w>/', $xml, $verb_syns);
  12.  
  13.  foreach($adj_syns[0] as $adj_syn) $pick[]=$adj_syn;
  14.         //same for verb/noun synonyms, I just want adjectives
  15.  
  16.  return $pick;
  17. }
  18. </w>

practically applying it,
I take a slab of stale old content and…

  • strip tags
  • do a regular match on all alphanumeric sequences dropping other stuff
  • trim the resulting array elements
  • (merge all blog tags, categories, and a list of common words)
  • excluding common terms from the array with text elements
  • excluding words smaller than N characters
  • set a percentage words to be synonimized
  • attempt to retrieve synonyms for remaining terms
  • replace these words in the original text, keep count
  • when I reach the target replacement percentage, abort
  • return (hopefully) a revived text
  1. function synonymize($origtext) {
  2.  
  3. //make a copy of the original text to dissect
  4.  $content=$origtext;
  5.  //content = $this->body;
  6.  
  7.  $perc=3;   //target percentage changed terms
  8.  $minlength=4;  //minimum length candidates
  9.  $maxrequests=80; //max use of api-requests
  10.  
  11.  
  12.  //dump tags
  13.  $content =  strip_tags($content);
  14.  
  15.  //dump non-alphanumeric string characters
  16.  $content = preg_replace('/[^A-Za-z0-9\-]/', ' ', $content);
  17.  
  18.  //explode on blank space
  19.  $wrds = explode(' ', strtolower($content));
  20.  
  21.  //trim off blank spaces just in case
  22.  for($w=0;$w<count ($wrds);$w++) $words[] = trim($wrds[$w]);
  23.  
  24.  //this should be all words
  25.  $wordcount = count($words);
  26.  
  27.  //how many words do I want changed ?
  28.  $toswitch = round($wordcount*$perc/100);
  29.  
  30.  //only use uniques
  31.  $words_unique=array_unique($words);
  32.  
  33.  //sort, start with words at the end of the text
  34.  sort($words_unique);
  35.  
  36.  //merge common with tags, categories, linked_tags
  37.  $common = array("never", "about", "price");
  38. //note : setting the minlength to 4 excludes lots of common terms
  39.    
  40.  for($i=0;$i<count($words_unique);$i++) {
  41.  //if in common array, not selectable for synonymizing
  42.   if(in_array($words_unique[$i], $common)) {} else {
  43.    //only terms bigger than minlength
  44.    if(strlen($words_unique[$i])>$minlength) {
  45.    //words_select contains candidates for synonyms
  46.     $words_select[] = trim($words_unique[$i]);
  47.    }
  48.   }
  49.  }
  50.  
  51.  //terms that can be changed
  52.  $max = count($words_select);
  53.  
  54.  //no more requests than max
  55.  if($max>$maxrequests) $max=$maxrequests;
  56.  
  57.  for($i=0;$i< $max;$i++) {
  58.  //get synonyms, give server some time
  59.   usleep(100000);
  60.   //retrieve synonyms etc.
  61.   $these_words = getsynonyms($words_select[$i]);
  62.   $jmax=count($these_words);
  63.   if($jmax&lt;1) {
  64.   //no results
  65.   } else {
  66. $count=0;
  67.    $j=0;
  68. //the replacements are done in the original text
  69.    $origtext= preg_replace('/'.$words_select[$i].'/i', $these_words[$j], $origtext, -1, $count);
  70.    $total_switched+=$count;
  71.  
  72.   } //have we reached the percentage ?
  73.   if($total_switched>=$toswitch) break;
  74.  }
  75.  //okay!
  76.  return $origtext;
  77. }
  78.  
  79. function getsynonyms($keyword) {
  80.  $pick=array ();
  81.  $apikey = 'get your own key at bighugelabs.com';
  82.  $xml=@file_get_contents('http://words.bighugelabs.com/api/2/'.$apikey.'/'.urlencode($keyword).'/xml');
  83.  if(!$xml) return $pick;
  84.  preg_match_all('/<w p="adjective" r="syn">(.*?)< \/w>/', $xml, $adj_syns);
  85.  foreach($adj_syns[0] as $adj_syn) $pick[]=$adj_syn;
  86.  return $pick;
  87. }
  88. </w></count>

Nothing fancy, a straightforward search-replace routine. A 1200 word text has about 150 candidates and for 3% synonyms I need to replace 36 words, it can do that. If I were to use it for real I would build a table with non-returning terms, and store often used terms, that would speed up the synonimizing, allow the use of preferences and take a load of the api use.

Categories
optimization, seo tips and tricks
Tags
optimisation, seo tips and tricks
Comments rss
Comments rss
Trackback
Trackback

« RedHat Seo : scraper auto-blogging the best of 2008 »

One Response to “synonymizer with api”

  1. quietaffiliate says:
    02/03/2009 at 9:58 am

    unique content being key whenever re-using content, i’ve seen that to be sure that you’ll always pass any human ’spot’ check that might occur, take a part of any article out and/or reformat the paragraph structure. By that I mean spacing/line break or any other formatting tags you may have used.

    I’ll do that once every 5-6 posts typically with pretty good results.

    Reply

Leave a Reply

Click here to cancel reply.

Recent Posts

  • geert wilders
  • gone till september
  • socialize me
  • Pagerank sculpting session
  • wish you were here

click me!
rss
Comments rss
Blog Directory
Web Developement Blogs - BlogCatalog Blog Directory
Listed in LS Blogs the Blog Directory and Blog Search Engine
Blog Flux Directory
joopita.com free web directory and search engine
design by jide
sitemap
22258 confirmed spam kills