spot the bot

I have some overhead scripts fetching data that can cost a few seconds extra loading time. Having traffic trigger tasks saves me the trouble of using cron-jobs, but I don’t want to run overhead scripts with visitors or googlebot on the site. Apart from that, some routines can use a lot of resources which are wasted on some crawlers.

I actually want the crawlers to come around, so I will make an array with bots and allowed_bots. Whatever is not on the white-list gets a meager page with overhead jobs attached to it, the rest (iow visitors and the big search engines) get the standard page.

There are truckloads of bots (see crawltrack), for my purposes a few regulars will do.


//hook it into 'init', run when calling script
add_action( 'init', 'spotabot' );

/**
 * checks if visitor is a bot
 *
 * This method checks the http_user_agent string
 * to see if the visitors is a non-essential bot
 *
 * @param void
 * @return void
 */

/*
   if(IS_A_BAD_BOT) {}
*/
function spotabot()
{
    $bot_list = array("Teoma", "betaBot", "alexa", "froogle", "Gigabot", "inktomi",
    "looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory",
    "Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot",
    "crawler", "www.galaxy.com", "Googlebot", "Scooter", "Slurp",
    "msnbot", "appie", "FAST", "WebBug", "Radian6", "Spade", "ZyBorg", "rabaz",
    "Baiduspider", "Feedfetcher-Google", "TechnoratiSnoop", "Rankivabot",
    "Mediapartners-Google", "Sogou web spider", "WebAlta Crawler");

    $bot_allowed = array("Googlebot", "Feedfetcher-Google", "Mediapartners-Google", "Slurp", "Baiduspider", "msnbot");

    foreach($bot_list as $bot) {
        if(strpos(strtolower("x".$_SERVER['HTTP_USER_AGENT']), strtolower($bot))>0)
        {
            foreach($bot_allowed as $okbot) {
                 if($okbot==$bot) {
                    define("IS_A_BAD_BOT", false);
                    return;
                 }
            
            define("IS_A_BAD_BOT", true);
            return;
            }
        }
    }
    
    define("IS_A_BAD_BOT", false);
    return;
}

In templates and functions i can use some simple code to run stuff conditional :

if (defined('IS_A_BAD_BOT')) {
			if(IS_A_BAD_BOT)
			{
				echo "hi bot
"; run_time_consuming_overhead_tasks(); and_omit_the_sidebar(); } else { echo "hello wonderful visitor
"; } } //if it is not defined it is not a bot or the function ain't present, //I am lazy and sloppy and don't want a code-break

It would be nice if WordPress built in a switch to run plugins conditional.

one related smart plugin is the chennai central plugin that sends 304 not modified headers on conditional GETs, so crawlers don’t fetch the page. That can save some bandwidth and serverload.

wordpress : fun with pluggable classes

I was checking some idea i had about writing a small user class with an option to ‘plug in’ functions for wordpress.

This page covers most of it :
dynamically add functions to php classes @ www.gen-x-design.com. The class construct at the end of the comment thread, Martin Pietschmann’s contribution, is rather useful. This pattern revolves around importing functionality from pluggable classes and exposing it through one object instance (the ‘decorator’ pattern mentioned is mostly used for writing extended classes, different objects with the same base data and functionality, ‘views’ sort of).

I can include the file with base, import and user class into function.php, and on making a user object have it read the directory and import functions modules (or load functionality conditional based on user role/authorization).

For this example I used the wordpress options table. I write a new functions class

 
class UserBogusPlugin extends MI_Importable
{
	public function the_anchor() {
//user_url and user_nicename are exposed through the user class
//I plug the functions class into, i can use the $this reference 
//as if i am writing code in the user class
                if($this->user_url<>'') {
                    return ''.$this->user_nicename .', ' . $this->first_name;
                }
//no url, no anchor....   
		return $this->user_nicename;
	}
}

…store the added class name in the options table…

//load $arr from options table
$modules = get_option('usermodules');
if($modules) $usermodules = json_decode($modules);
//add module
$usermodules[] = 'UserBogusPlugin';
//store back in options
add_option('usermodules', json_encode($usermodules)); 

…and load the plugin classes when instantiating the user object :

class User extends MI_Base
{
	public function __construct($ID) {
            $modules = get_option('usermodules');
            if($modules) {
                $usermodules = json_decode($modules);
                foreach($usermodules as $module) {
                  if(class_exists($module)) $this->import(new $module);
                 //or..
                 //include('plugclass/'.$module.'.class.php');
                 //$this->import(new $module);
                }
            }
            $this->ID=$ID;
	}
}

In the wordpress template i can use the added functionality through the user instance :

      $my_user = new User($user_id);
      ...
      echo $my_user->the_anchor();

Fun with classes :)

add. 3-8 (qed) :
a function to load a directory with plugin files, i add a header /* plugin pluginfilename */ and check the files if there is a header.

       public function getPlugins()
        {
            $plugdir = TEMPLATEPATH .'/plug';
            if ($handle = opendir($plugdir)) {
                $retval = array();
                while (false !== ($file = readdir($handle))) {
                    if (($file <> ".") && ($file <> "..")) {
                        $fh = fopen($plugdir.'/'.$file, 'r');
                        $contents = '';
                          $contents .= fread($fh, 1024);
                        fclose($fh);
                        if(preg_match('/lugin/', $contents))
                        {   //check for header : plugin, grab pluginname
                            $a = strpos($contents, 'lugin');
                            $a += 6;
                            $b = strpos($contents, ' ', $a);
                            $plugname = substr($contents, $a, $b-$a);
                            $retval[] = array($plugdir.'/'.$file, $plugname);
                }  }  }
                closedir($handle);
            }
            return $retval;
        }


	public function __construct($id) {
//get he array with plugin files
            $usermodules = $this->getPlugins();
            if($usermodules) {
                foreach($usermodules as $module) {

                  //use require to load and import to add the function

                   require_once($module[0]);
                   $this->import(new $module[1]);    
                }
            }
            $this->ID=$id;		
	}

sneak preview

I was working on a wordpress install for friends of mine. Marieke Zijlstra and Ruerdtsje made the layout, I stuffed it in a template.

I had to learn a bit of css and jquery and brush up on photoshop.

pallieter small

We still have to add the sidebar slideshow and add some pictures for rotating header graphics, I’ll do that after it goes live, it was a short term. The css does not validate completely, fortunately there are hundreds of css experts that can iron out the glitches in the stylesheet.

Maybe I’ll dig into css some more later and fix it myself, it’s pretty basic stuff.

When the site is live I’ll add a link to it.