How to grab keywords from 7search

“Seo tips and tricks” was not due til November, but this one just popped up today. I was looking for a tool to build a rapid keyword set for a blog, without doing extensive keyword research. The blackhat ‘scraper’ scripts I found come up with ‘michigan seo’ far too many times :)

How to grab keyword sets from 7Search

I want a set of keywords as blog categories to write a blog that contains material with the most popular keywords covering the whole active search pattern set. A nice tool for that is 7Search‘s keyword tool.

It has a captcha protection, you have to answer it once and then you can query as much as you like, it shows last months top 100 search patterns with that keyword and the search volumes :

seo
1,991,112 $0.34 $0.33 $0.21 $0.09 $0.08
seo web design 8,085 $0.07 $0.02 $0.01
seo tool 2,647 $0.05 $0.02 $0.02 $0.01

As I am extremely lazy and hate typing data, I’ll make a quick script to cut and paste that list and have it magically transformed in a wordpress blog category list.

It turned out to be a simple one page program : I do a query on a keyword, select the result table area of the 7Search page (with the mouse : ) and paste it as text into my own form’s textarea, add the main key, and post it.

From the $_POST array, I take the textarea input and explode it on linebreaks. To get the keywords, I check for the first occurrence of 0-9, take the part that comes before it, and have the keywords.

In this function I test for the first 0-9. Had I stopped at the first number and started at 0, I would get thrown out of the loop if there is any 0 in the line (or 1, 2, 3…), regardless of there being any number before the first 0 :

  1. $pos = strpos($linesarr[$x], $i);
  2. if($pos >0) {
  3.     if($pos< $minpos) {
  4.         $mykeys = substr($linesarr[$x],0,$minpos);
  5.         echo $mykeys;
  6.         break;
  7.  }}}

So I test for the first 0 and store the position in minpos, then test for the first 1, 2, 3…, if it comes before the first 0, minpos is set to the lowest position.

  1. $lines=$_POST['textarea'];
  2. $linesarr = explode("\r\n", $lines);
  3.  
  4. for ($x=0;$x<count ($linesarr);$x++) {
  5. //set minpos to the length of the line
  6.     $minpos=strlen($linesarr[$x]);
  7.  
  8. //check numbers 0-9
  9.     for($i=0;$i&lt;10;$i++) {
  10. //get position of first number $i in the line
  11.         $pos = strpos($linesarr[$x], $i);
  12.         if($pos >0) {
  13.             if($pos< $minpos) $minpos=$pos;
  14.         }
  15.     }  
  16.  
  17. //is minpos smaller than the length of the line ? then its valid data
  18.         if($minpos<strlen($linesarr[$x])) {
  19.             $mykeys = substr($linesarr[$x],0,$minpos);
  20.             echo $mykeys;
  21.         }
  22. }

That way I always get the first number in the line and the part before it is the whole keyword text.

I also want the search volumes, which is the first full string after the keywords up till the first $-dollar sign. The minpos counter is already at the start digit of the volume. I can get the position of the first dollar sign, and trim off the blanks.

  1. //volume is the is at the start of the string after minpos
  2.  $volstr = trim(substr($linesarr[$x], $minpos));
  3. //and before the first dollar sign
  4.  $volcut = strpos($volstr, "$");
  5. //it contains "," : 9,111,222 so filter out the nonsense for mysql :
  6.  $vol = preg_replace('/,/', '', trim(substr($volstr, 0, $volcut)));
  7.  
  8.  if($minpos<strlen ($linesarr[$x])) {
  9.     $mykeys = substr($linesarr[$x],0,$minpos);
  10.     echo $mykeys."_".$vol;
  11.         }

[After this I stuff the data in a mysql table `sevencats`].

How to add a keyword list to wordpress as categories

Let's add the keywords to a wordpress blog as categories. WordPress has a very simple function for it wp_insert_term in the taxonomy.php file.

In wpmu you do first have to pick the target blog, as you work on a blogs tableset, wp1_, wp2_ etcetera and if you start it up you get the admin users main blog as active tableset. If you want to add data like categories in another blogs taxonomy table you have to switch to that table set first.

  1. function connect_data() {
  2.   $DB_USER =  "";
  3.   $DB_PASSWORD = "";
  4.   $DB_HOST = "";
  5.   $DB_DATA = "";
  6.   $link =  mysql_connect($DB_HOST, $DB_USER, $DB_PASSWORD) or $error = mysql_error();
  7.   if (!$link) {
  8.       return $error;
  9.   }  
  10.         mysql_select_db($DB_DATA, $link) or $error = mysql_error();
  11.   return $link;
  12.  }
  13.  
  14. //link
  15. $cats=connect_data();
  16.  
  17. //get array with categories
  18. $categories=array();
  19.  
  20. $qry="SELECT cat FROM `sevencats`";
  21. $lst=mysql_query($qry, $cats) or die('list error '.mysql_error());
  22. while($row=mysql_fetch_assoc($lst)) {
  23.  $categories[]=$row['cat'];
  24. }
  25. //close db connection
  26. mysql_close($cats);
  27.  
  28. //open wordpress connection
  29. include_once('wp-config.php');
  30. include_once('wp-includes/wp-db.php');
  31. include_once('wp-includes/taxonomy.php');
  32.  
  33. //select target blog by id
  34. switch_to_blog(3);
  35.  
  36. //insert categories
  37. for ($i=0;$i<count ($categories);$i++) {
  38.       wp_insert_term($categories[$i], 'category');
  39. }
  40.  
  41. //switch back to users main blog
  42. restore_current_blog();

For a normal wordpress install you'd not have to switch blogs :

  1. //open wordpress connection
  2. include_once('wp-config.php');
  3. include_once('wp-includes/wp-db.php');
  4. include_once('wp-includes/taxonomy.php');
  5.  
  6. //insert categories
  7. for ($i=0;$i<count ($categories);$i++) {
  8.       wp_insert_term($categories[$i], 'category');
  9. }

That gets me the top 100 searches of last month as categories for my new blog all. You can fiddle with it a bit and only pick searches with a volume above 2000 monthly searches (just in case you want to go scraping and only want material that gets you in the serp pages for the high volume search terms).

Next edition : Red Hat Seo (with jingle bells) the Christmas Special :)

Posted in seo tips and tricks, wordpress | Tagged , | Leave a comment

zend php and google webmaster api II : wordpress mu auto-register

Part Deux of automating the registration and verification of a wordpress blog. In the previous post I showed how to add a site to google webmaster tools.

Which site you ask ? Oh dear… in the previous post I did not mention how to create a new blog in wpmu :

include_once('wp-config.php');
  1. include_once('wp-includes/wp-db.php');
  2. include_once('wp-includes/wpmu-functions.php');
  3. $newblogid= wpmu_create_blog('tryout.blacknorati.com', '/', 'tryout', 1);

Very basic, assuming I am the admin user (with ID=1). After creating the blog, I post it’s url to google webmaster tools to start the registration. Then I want to

  • verify the site
  • add a sitemap
  • and blog on!

verifying a site

I can add any url to Google Webmaster Tools, but I only get to use the tools once Google are sure I ‘own’ the domain or subdomain. Verification is done by checking on the presence of a header metatag in the index file, or a specific file on the server. Once Google spots it, Google know I control the site and I can use the webmaster tools.

On a WordPress Mu install I do not, as user, get to have my own template. I currently have 100 standard templates installed to choose from, some with options and widgets and that should be enough. But editing the template itself is not possible for separate users, so I cannot verify sites with a header metatag.

The alternative is putting a file on the server with a particular codename, but users don’t have an actual separate subdomain with a wordpress Mu install, so that one also won’t work.

Eek ! Well, no problem, Google also accept a post with the filename in the url. Just blog a post with the google___.html filename as title, WordPress automatically turns the title into the url and you can use that post to have Google verify the site is yours.

getting the verification filename

A Google Webmaster Tools account has it’s own standard verification code and it’s valid for every site. Once a user registered the site with GWT, I can retrieve that code from the sites data feed :

function get_verification_title($domain, $client) {
  1.   $myfeed = get_site($domain, $client);
  2.     foreach ($myfeed as $item) {
  3.   $tags     = "";
  4.          $subjects = $item-&gt;{"wt:verification-method"};
  5.          if (is_array($subjects) and count($subjects) &gt; 0) {
  6.     return $subjects[1];
  7.    }
  8.   }
  9. }
  10.  
  11. function get_site($domain, $client) {
  12.   $fdata = new Zend_Gdata($client);
  13.   $tgt="https://www.google.com/webmasters/tools/feeds/sites/".htmlentities(urlencode('http://'.$domain.'/'));
  14.   $result=$fdata-&gt;getFeed($tgt);
  15.   return $result;
  16. }

With the get_site function I retrieve the site’s atom list as zend feed. The feed contains two wt:verification-method tags, one for the metatag and one for the html-file. This function loads both in the $subjects array and i pick item[1] (it’s a 0 based array), the html file name. I need that one to go post on the new blog. Here is a php routine taken from Snipplr.

function add_verify_post($domain, $verification, $logon, $pass) {
  1.  $category='';
  2.  $req = 'title='. $verification . '&amp;content=' . $verification . '&amp;category=' . $category . '&amp;logon=' . $logon . '&amp;pass=' . $pass;
  3.  $header .= "POST /remote_post.php HTTP/1.0\r\n";
  4.  $header .= "Host: ". $domain."\r\n";
  5.  $header .= "Content-Type: application/x-www-form-urlencoded\r\n";
  6.  $header .= "Content-Length: " . strlen ($req) . "\r\n";
  7.  $header .= "Connection: Close\r\n\r\n";
  8.  $fp = fsockopen($domain, 80, $errno, $errstr, 30);
  9.  $SUCCESS = false;
  10.  
  11.  if (!$fp) {
  12.   $status_message = "$errstr ($errno)";
  13.   $res = "FAILED";
  14.  }
  15.  else {
  16.   fputs ($fp, $header . $req);
  17.   while (!feof($fp) &amp;&amp; $SUCCESS==false) {
  18.    $res = fgets ($fp, 1024);
  19.    if (strcmp ($res, "SUCCESS") == 0) {
  20.     $SUCCESS = true;
  21.    }
  22.    if(!empty($res)){
  23.     $last_line = $res;
  24.    }
  25.   }
  26.  }
  27.  fclose($fp);
  28.  
  29.  if ($SUCCESS == true){
  30.  }else{
  31.   echo $last_line;
  32.   }
  33.  }
  34. }

The remote_post.php code is the same as the snippet.

I am the owner of the blog so I can use the standard admin login and password in the function. For security purposes I’d use a different login and password for remote access though (this one does not use SSL).

With a simple call I send one new post to the new blog with the google verification file name as title.

add_verify_post('BlogSubdomain.blacknorati.com', 'google12345.html', 'MyLogin', 'MyPassword');
  1. I had some doubts about google accepting <strong>blog.blacknorati.com/year/month/'google12345html</strong> but they actually accept it so I don't have to adapt the permalink settings.
  2.  
  3. Now I have to send Google a 'verify' xml message,
  4. <pre lang="php">function verify_site($domain, $client) {
  5.  //domain without http
  6.  $xml='
  7.     http://'.$domain.'';
  8.  $xml.="";
  9.    $xml.='
  10.     ';
  11.   $fdata = new Zend_Gdata($client);
  12.   $result=$fdata-&gt;post($xml,"https://www.google.com/webmasters/tools/feeds/sites/".urlencode('http://'.$domain)."/");
  13.   return $result;
  14. }

presto, now Google know I control the site, and I can use the webmaster tools. That means I can add the sitemap. And that in turn means my sites are indexed a lot faster.

function add_webmap($domain, $sitemap, $client) {
  1.  //domain without http
  2.  $xml='
  3.     http://'.$sitemap.'';
  4.     $xml.="
  5.      WEB
  6.    ";
  7.  
  8.   $fdata = new Zend_Gdata($client);
  9.   $myaddress= "https://www.google.com/webmasters/tools/feeds/".htmlentities(urlencode('http://'.$domain.'/'), ENT_QUOTES)."/sitemaps/";
  10.   $result=$fdata-&gt;post($xml,$myaddress);
  11.   return $result;
  12. }

Happy now. Google Webmaster Tools API was top of my wish-list. Now I can register and verify 32.000 sites with sitemaps automatically, so that saves me at least 2500 hours of work. And it was actually easier than I thought, with the proper examples and snippets available online.

I am going to clean up the code a bit and stuff it in a class, and move on to developing large scale 'grey' ops :)

Posted in php, wordpress | Tagged , | 7 Comments

zend php and google webmaster tools api

update 2: Sandrine worked out a set of routines, as far as I know using Zend 1.7, she lists the code here.

update: Google updated their API in oktober (almost at the time I wrote these posts) and this code fails as it still based on the V1 APi. You can access the whole WT: toolset namespace (including sitemaps, verification) through the V2 API now, but you need to send a version id along with your request, that is handled in the new Zend 1.7 download.

The Problem

I can add 32.000 blogs on a standard WordPressMu install. How do I add 32.000 subdomains, verify them and add their sitemaps to Google Webmaster, without having to go to the webmaster page about 96.000 times ?

The solution

Integrating Google Webmaster Tools API into my WordPress Mu install.

What is it worth ?

If registering and verifying a site and adding a sitemap takes 5 minutes per domain, at E12,- per hour, that makes it 96.000 euros and 4 labor years for 32.000 sites. Writing a script is worth E96.000,- and saves me four years of mindless drone work, so that is well worth having a look at.

Software : Zend

Zend gData is a php framework that is programmed to handle Google Data. Their ClientLogin routine isn’t very flexible and they haven’t covered GWT Api yet, so I’ll have to hack some routines together.

After getting stonewalled by the zend program a few times, I went searching and ended up on ngoprekweb who have a nice post on ClientLogin authorization for the blogger api. Eris Ristemena uses a modified Zend ClientLogin, very nice work. I installed the adapted classes and tried that one to get through the ClientLogin, and it paid off.

The good stuff : Gwt api access

I am not interested in the blogger stuff though, I want access to GWT Google Webmaster Tools, so I worked Eris Ristemena’s blogger routine around a little.

set_include_path(dirname(__FILE__) . '/Zend_Gdata');
  1.   require_once 'Zend.php';
  2.   Zend::loadClass('Zend_Gdata_ClientLogin');
  3.   Zend::loadClass('Zend_Gdata');
  4.   Zend::loadClass('Zend_Feed');
  5.  
  6.   $username     = '';
  7.   $password     = '';
  8.   $service      = 'sitemaps';
  9.   $source       = 'Zend_ZendFramework-0.1.1'; // companyName-applicationName-versionID
  10.   $logintoken   = $_POST['captchatoken'];
  11.   $logincaptcha = $_POST['captchaanswer'];
  12.  
  13.   try {
  14.     $resp = Zend_Gdata_ClientLogin::getClientLoginAuth($username,$password,$service,$source,$logintoken,$logincaptcha);
  15.  
  16.     if ( $resp['response']=='authorized' )
  17.     {
  18.       $client = Zend_Gdata_ClientLogin::getHttpClient($resp['auth']);
  19.       $gdata = new Zend_Gdata($client);
  20.  
  21.    $feed = $gdata-&gt;getFeed("https://www.google.com/webmasters/tools/feeds/sites/");
  22.          foreach ($feed as $item) {
  23.        echo '

  1. ';
  2.         }
  3.    }
  4.    elseif ( $resp['response']=='captcha' )
  5.    {
  6.      echo 'Google requires you to solve this CAPTCHA image';
  7.      echo '

';
  1.       echo '
‘; echo ‘Answer :
‘; echo ‘
‘; echo ‘
‘; echo ‘
  1. ';
  2.      exit;
  3.    }
  4.    else
  5.    {
  6.      // there is no way you can go here, some exceptions must have been thrown
  7.    }
  8.  
  9.  } catch ( Exception $e )  {
  10.    echo $e-&gt;getMessage();
  11.  }

(I added https://www.google.com/accounts/ to the captcha image source, otherwise it keeps drawing blanks.)

Zend uses a “HttpClient” for the connection to Google, and a gData class (usually the main ‘feed’, blogs, sites) that you use to do basic data manipulation. All feed entries are an atom format with a custom namespace.

Now I am going to add a domain. In my add_site function I put an XML Atom together to post (using the post() function of the gData class) to the sites feed url, and the Google API does the rest :

function add_site($domain, $client) {
  1.   $xml='';
  2.   $xml.='';
  3.   $xml.='';
  4.   $fdata = new Zend_Gdata($client);
  5.   $result=$fdata-&gt;post($xml,"https://www.google.com/webmasters/tools/feeds/sites/");
  6.   return $result;
  7. }

In the main routine I pass the domain and the running httpclient to the add_site() function :

if ( $resp['response']=='authorized' )
  1.     {
  2.       $client = Zend_Gdata_ClientLogin::getHttpClient($resp['auth']);
  3.       echo add_site('test.blacknorati.com', $client);
  4.     }

Cool. That saves me up to 32.000 site registrations. The rest of it is still greek to me, but this part functions. Next week : more nonsense (verify the site, add a sitemap, and integrate it in the blog creation function of wordpress mu).

1) about the blogger function : I tried to list the blogger posts with the ngoprekweb php code, but it seems blogger use a different string these days to identify the blog in gData, the id is returned as “tag:blogger.com-blabla-(blogid)” and you want the last part to access the blogs post atom feed :

$idText = split('-', $item->id());
  1.         $blogid = $idText[2];

(modified from the Zend 1.6.1 codebase)

foreach ($feed as $item) {
  1.         echo '<a href="'.$item->link(">' . $item-&gt;title() . '</a>';
  2.  
  3. $idText = split('-', $item-&gt;id());
  4.        $blogid = $idText[2];
  5.  
  6.        $feed1 = $gdata-&gt;getFeed("http://www.blogger.com/feeds/$blogid/posts/summary");
  7. //…
  8. }
Posted in php, tool, wordpress | Tagged , , | 4 Comments