serp tool

I was putting together a serp tool with a database and an emailer on a cronjob, mainly because I am lazy, I always get cranky when I have to type in these keywords again, I ain’t a teletubby.

I wanted an automated one with a history that sends me an email every day, so I started putting on together but I got distracted because the gethost server was shut down and my moms Mary Chapel event blog was on it. My old site was also on it, with my crappy blog, but losing that ain’t half as bad as losing yer moms Mary Chapel blog, that is bad karma.

So I got her a domain and put it on one of my accounts, installed a new blog and made sure it ranks number one in google again. Apart from that I was putting together a scraper and still needed some cron jobs for the scraper scripts.

I found a nice free cron job site, standard 5 jobs a day. It’s free and it works flawless, no hassles. Very nice.

Once I had that I remembered I still had to finish that serp thing as well, so I finished the basic routines today. Still needs some testing, and it could use the yahoo and msn serps. I’ll grab them from a wordpress widget and add some language options, after that it should be a fine tool.

I’ll put the scripts up for download in two or three weeks or something.

zend php and google webmaster tools api

update 2: Sandrine worked out a set of routines, as far as I know using Zend 1.7, she lists the code here.

update: Google updated their API in oktober (almost at the time I wrote these posts) and this code fails as it still based on the V1 APi. You can access the whole WT: toolset namespace (including sitemaps, verification) through the V2 API now, but you need to send a version id along with your request, that is handled in the new Zend 1.7 download.

The Problem

I can add 32.000 blogs on a standard WordPressMu install. How do I add 32.000 subdomains, verify them and add their sitemaps to Google Webmaster, without having to go to the webmaster page about 96.000 times ?

The solution

Integrating Google Webmaster Tools API into my WordPress Mu install.

What is it worth ?

If registering and verifying a site and adding a sitemap takes 5 minutes per domain, at E12,- per hour, that makes it 96.000 euros and 4 labor years for 32.000 sites. Writing a script is worth E96.000,- and saves me four years of mindless drone work, so that is well worth having a look at.

Software : Zend

Zend gData is a php framework that is programmed to handle Google Data. Their ClientLogin routine isn’t very flexible and they haven’t covered GWT Api yet, so I’ll have to hack some routines together.

After getting stonewalled by the zend program a few times, I went searching and ended up on ngoprekweb who have a nice post on ClientLogin authorization for the blogger api. Eris Ristemena uses a modified Zend ClientLogin, very nice work. I installed the adapted classes and tried that one to get through the ClientLogin, and it paid off.

The good stuff : Gwt api access

I am not interested in the blogger stuff though, I want access to GWT Google Webmaster Tools, so I worked Eris Ristemena’s blogger routine around a little.

set_include_path(dirname(__FILE__) . '/Zend_Gdata');
  require_once 'Zend.php';
  Zend::loadClass('Zend_Gdata_ClientLogin');
  Zend::loadClass('Zend_Gdata');
  Zend::loadClass('Zend_Feed');

  $username     = '';
  $password     = '';
  $service      = 'sitemaps';
  $source       = 'Zend_ZendFramework-0.1.1'; // companyName-applicationName-versionID
  $logintoken   = $_POST['captchatoken'];
  $logincaptcha = $_POST['captchaanswer'];

  try {
    $resp = Zend_Gdata_ClientLogin::getClientLoginAuth($username,$password,$service,$source,$logintoken,$logincaptcha);

    if ( $resp['response']=='authorized' )
    {
      $client = Zend_Gdata_ClientLogin::getHttpClient($resp['auth']);
      $gdata = new Zend_Gdata($client);

	  $feed = $gdata->getFeed("https://www.google.com/webmasters/tools/feeds/sites/");
         foreach ($feed as $item) {
	      echo '

'; } } elseif ( $resp['response']=='captcha' ) { echo 'Google requires you to solve this CAPTCHA image'; echo '

';
      echo '
‘; echo ‘Answer : ‘; echo ‘ ‘; echo ‘ ‘; echo ‘
';
      exit;
    }
    else
    {
      // there is no way you can go here, some exceptions must have been thrown
    }

  } catch ( Exception $e )  {
    echo $e->getMessage();
  }

(I added https://www.google.com/accounts/ to the captcha image source, otherwise it keeps drawing blanks.)

Zend uses a “HttpClient” for the connection to Google, and a gData class (usually the main ‘feed’, blogs, sites) that you use to do basic data manipulation. All feed entries are an atom format with a custom namespace.

Now I am going to add a domain. In my add_site function I put an XML Atom together to post (using the post() function of the gData class) to the sites feed url, and the Google API does the rest :

function add_site($domain, $client) {
		$xml='';
		$xml.='';
		$xml.='';
		$fdata = new Zend_Gdata($client);
		$result=$fdata->post($xml,"https://www.google.com/webmasters/tools/feeds/sites/");
		return $result;
}

In the main routine I pass the domain and the running httpclient to the add_site() function :

   if ( $resp['response']=='authorized' )
    {
      $client = Zend_Gdata_ClientLogin::getHttpClient($resp['auth']);
      echo add_site('test.blacknorati.com', $client);
    }

Cool. That saves me up to 32.000 site registrations. The rest of it is still greek to me, but this part functions. Next week : more nonsense (verify the site, add a sitemap, and integrate it in the blog creation function of wordpress mu).

1) about the blogger function : I tried to list the blogger posts with the ngoprekweb php code, but it seems blogger use a different string these days to identify the blog in gData, the id is returned as “tag:blogger.com-blabla-(blogid)” and you want the last part to access the blogs post atom feed :

	$idText = split('-', $item->id());
        $blogid = $idText[2];

(modified from the Zend 1.6.1 codebase)

      foreach ($feed as $item) {
        echo '' . $item->title() . '';

	$idText = split('-', $item->id());
        $blogid = $idText[2];

        $feed1 = $gdata->getFeed("http://www.blogger.com/feeds/$blogid/posts/summary");
//...
}

serp tool 2008

I never got around to building a serp tool with a mysql backend, one on a cronjob with an email option.
This weekend I got an OSWD free template, then it at least looks like a website. I develop a lot easier when it has a template. Filling in the blanks.

Often if you program a rough sketch, in time it sort of develops itself. If you don’t start nothing gets done.

I am developing it on juust.org/serp/, once its finished i’ll put one on trismegistos.net and put the source up for download.

It’s flimsy but the idea of a serp minisite with emailer option is commercially attractive.

It’s also important for the blog to have a ‘serp’ page in the menu, and a serp page should have serp tools, in abundance, all kinds of them.