google trends crosstab

I was playing with Google Trends using another blog. Of the seven or eight posts I tried, five made it to the details page, which gave me some traffic. I am too lazy to follow 100 entries during the day, so I want a simple report on how trends develop the past day like this :

google trends watch

Now how do I get that done ?

First I make a cronjob that runs a routine to pull the Google trends xml-feed in every hour and store it in a database. I add a string for the period (year-month-day-hour).

That gives me a table like this one :

DATE SEARCHID POS
2008120101 baboon 78
2008120101 monkey 13
2008120102 baboon 98
2008120102 monkey 5
2008120103 monkey 3

I want an output

2008120101 2008120102 2008120103
baboon 78 98
monkey 13 5 3

The way to do that is a crosstable query, also called a pivot table .

MySql pivot table

I have to make a routine to turn that base table into a table with the dates as column, and per phrase the position in the correct column. I want a complete row, so I take the highest period from the table and the lowest, and from that intrapolate the other periods.

function make_periods($begin, $end) {
$BeginTime=mktime(substr($begin, 8, 2),0,0, substr($begin,4,2), substr($begin,6,2), substr($begin,0,4));
$EndTime=mktime(substr($end,8,2),0,0, substr($end,4,2), substr($end,6,2), substr($end,0,4));
//divide the difference by 60minutes*60seconds
$periods = ($EndTime-$BeginTime) / 3600;
//make a row of hour-periods with "+N hour",
for($i=0;$i< ($periods+1);$i++) $myperiods[] = strftime('%Y%m%d%H', strtotime("+$i hour", $BeginTime));
//return an array with all periods 
//that are to be the column headers in the crosstable
	return $myperiods;
}

At some point I have to clean up the table, and I want the data to fit in one page,
so I will use 24 periods

$plength=25;
if(count($dates)>$plength) {
	for($j=0;$j< ($plength+1);$j++) $usedates[] = $dates[count($dates) - ($plength-$j)];
} else {
//note : this one needs some work.
     $usedates[]=$dates;
}

... and delete all older records :

//make the periods
$dates=array();
$dates = make_periods($begin, $end);

//use the $dates array, see if I have more than 24 periods
if(count($dates) > 24) {
//get period number-last-minus-24
//delete all records before that period
  mysql_query("DELETE FROM `serp_trends`  WHERE `date`< '".$dates[count($dates)-24] ."'", $link);
}

...that keeps the table small enough for quick querying. I have 24 periods to make columns, how do I get the position as value in each column ? I read the 'wizard' article on mysql pivot tables and that sounded difficult.

I don't get it, but I got it running :

  • group the records by phrase ("searchid")
  • use the array of periods as columns
  • see if the phrase has a position in a specific period,
    • if so, take the position as value (it only has one per period)
    • otherwise take 0 as value
  • sum the values
  • name the column "T"+period
$usedates=$dates;
$CROSTAB = "SELECT `searchid`, AVG(`pos`) AS AP ";
	for($j=0;$j

I take the average position from the grouped records, order it ascending and output that as table, first the two columns phrase (searchid) and average position, then the periods as column names.

$link=connect_data();

$result=mysql_query($CROSTAB, $link) or die(mysql_error());

//add the zebra class marker for mootool
$CTB.= '';

//make the header row
$CTB.='';

for($j=0;$j';
	for($j=0;$j'; }
	}
	$CTB.='';
}
$CTB.='
avgsearch'.$row['searchid'].'
';

When I echo $CTB, the crosstable, this comes out :

google trends watch

Seeing rows per phrase means I can check a lot easier which trends are running, rising, dropping, or even returning. And that was what I wanted.

I zipped the files for download.

what is my website worth ? (php fetching)

What is the worth of my puny blog ? Here’s two snippets (download) to grab estimates from two sites. The first one is business-opportunities.biz, the second webvaluer.org.

According to www.business-opportunities.biz my blog is worth 1700 dollar. Their estimate is derived from the Technorati blog cosmos so it only works for blogs that are registered there.

function what_my_site_is_worth_according_to_business_opportunities($url)
{
    $html=file_get_contents('http://www.business-opportunities.biz/projects/how-much-is-your-blog-worth/submit/?url='.$url);
    $cut1=explode('is worth',$html);
    $cut2=explode('< /p>',$cut1[1]);
    $result=trim($cut2[0]);
    if(strlen($result)==0) return(0);
    return($result);
} 

use

echo what_my_site_is_worth_according_to_business_opportunities('www.juust.org');

That returns in my case $1,693.62

According to webvaluer.org my website is worth $7,549 US dollar. Their site returns a value for every type of site. In the page source there is a span format tag around the text

< h1>juust.org Estimated Worth < span style="color:green; font-weight:bold;">$7,549 USD< /span>< /h1>

I cut it before and after the span tag, and use the php function strip_tags() to remove the span tag, and only the text string remains.

function what_my_blog_is_worth_according_to_webvaluer($url)
{
    $html=file_get_contents('http://www.webvaluer.org/us/'.$url);
    $cut1=explode('Estimated Worth', $html);
    $cut2=explode('< /h1>',$cut1[1]);
    $result=(trim($cut2[0]));
    if(strlen($result)==0) return(0);
    return(strip_tags($result));
} 

use :

echo what_my_blog_is_worth_according_to_webvaluer('www.juust.org');

How to make your own php fetch function :

  • check the result page html source
  • see what tag or phrase comes before the text you want (‘Estimated Worth’)
  • use explode() or split() to cut the source up in pieces on that phrase
  • take the piece after the phrase (arrays are 0-based,use cut1[1] to get the second piece)
  • cut that up at the first tag (‘< /h1>‘) after the text you want
  • use the first piece (cut2[0]) that holds the text you want

It’s easier to use source splitting for this sort of fetching than regular expressions, which is sooner meant to retrieve generic patterns (email addresses and stuff) from lots of pages (preg_match()), or an array of similar string from one page (preg_match_all(), for serp pages or report tables).

note : my php code highlighter refuses to display html tags in functions proper so in the code I added a blank space : $cut2=explode(‘< /p>‘,$cut1[1]);
Remove the blanks from the html tag in the function for it to work proper or use the text source
.

how to keep a traffic log with mysql and php

A rather obsolete post. I hacked a quick referrer tracking script together because I wanted to see how Googlebot indexed my serp tool pages. I can use Google analytics or phpMyVisites, but that is not as rewarding as keeping my own logs.

I start with a mysql table LOG to store the traffic data

CREATE TABLE IF NOT EXISTS `log` (
  `id` bigint(11) NOT NULL auto_increment,
  `ip` varchar(20) NOT NULL,
  `page` varchar(120) NOT NULL,
  `visitpage` varchar(120) NOT NULL,
  `visitdate` varchar(25) NOT NULL,
  `agent` varchar(200) NOT NULL,
  `protocol` varchar(10) NOT NULL,
  `url` varchar(120) NOT NULL,
  `visitpath` varchar(80) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

on every page I track, I start with

session_start();
log_hit();

The function log_hit() relies on the php $_SERVER array that holds all data the web server collects:

function log_hit();
        $ip   = $_SERVER['REMOTE_ADDR'];

//if it is my own ip, exit the routine
	if($ip =="my.own.ip.here") return;

//what page is my visitor at ?
	$page  = "http://".$_SERVER['HTTP_HOST'].$_SERVER['PHP_SELF'];

//what page did he come from ?
	$visitpage  = $_SERVER['HTTP_REFERER'];

	$visitdate  = date("YmdHi");

//what user-string is it (iow: is it googlebot and pals or an actual visitor ?)
	$agent      = $_SERVER["HTTP_USER_AGENT"];        

//the rest i dont use but i'll throw them in just for fun :
	$protocol   = $_SERVER['SERVER_PROTOCOL'];
	$method    = $_SERVER['REQUEST_METHOD'];
	$url           = $_SERVER['REQUEST_URI'];
	$visitpath   = parse_url($page, PHP_URL_PATH);

	if(!$ip) {} else {
//if I have an Ip, register the hit :
//connect to mysql table 
		$m=connect_data();

//and store it in the LOG table
		$myqry="INSERT INTO `serp_log` (
		`ip` ,`page` ,`visitpage` ,`visitdate` ,`agent` ,`protocol` ,`url` ,`visitpath` ) VALUES (
		'".$ip."', '".$page."', '".$visitpage."', '".$visitdate."', '".$agent."', '".$protocol."', '".$url."', '".$visitpath."')";
		mysql_query($myqry, $m) or die('store log :'.mysql_error());
		mysql_close($m);
}

the connect_data() function is standard mysql connection

function connect_data() {
		$DB_USER =  "user";
		$DB_PASSWORD = "password";
		$DB_HOST = "server";
		$DB_DATA = "database";
		$link =  mysql_connect($DB_HOST, $DB_USER, $DB_PASSWORD) or $error = mysql_error();
		if (!$link) {
	    	return $error; 
		} 	 
	    mysql_select_db($DB_DATA, $link) or $error = mysql_error();
		return $link;
	}

It is far from perfect, but it works. It helps check the crawl-rate of pages. You can filter out the useragent ‘Googlebot’ and run a query grouping useragent by date and see what times Googlebot comes by.

//66.249.72.143 is GoogleBot
check_traffic_by_IP('66.249.72.143' );

function check_traffic_by_IP($thisIP) {
	$m = connect_data();
	$qry = "SELECT * from `log` WHERE `ip` = '".$thisIP."' GROUP BY `visitdate`";
	$lst = mysql_query($qry, $m);
	echo '';
	while($row=mysql_fetch_assoc($lst)) {
		echo '';
	}
	echo '
'.$row['visitdate'].'
'; }

…or exclude all records if the UserAgent indicates it was a Bot, and get a listing of organic beings visiting the pages :

function check_real_traffic() {
	$m = connect_data();

//the bots
	$bots = array('googlebot', 'anonymouse');

	$qry = "SELECT * from `log`";
	$lst = mysql_query($qry, $m);

	echo '';

	while($row=mysql_fetch_assoc($lst)) {

//check if the entries in the bots-array
//occur in the stored useragent string 
		$traffic=-1;
		for($i=0;$i0) $traffic+=1;
		} 

//if it's not a bot (then the $traffic var is still -1), output the info
		if($traffic==-1) {
               echo '';
                }
	}
	echo '
'.$row['visitdate'].''.$row['ip'].''.$row['page'].'
'; }

…that shows ‘organic’ visitors.

other resources
tek-tips.com/ Limit Requests per day by IP

phpMyVisites open source free referrer and traffic analysis