how to keep a traffic log with mysql and php

A rather obsolete post. I hacked a quick referrer tracking script together because I wanted to see how Googlebot indexed my serp tool pages. I can use Google analytics or phpMyVisites, but that is not as rewarding as keeping my own logs.

I start with a mysql table LOG to store the traffic data

  1. CREATE TABLE IF NOT EXISTS `log` (
  2.   `id` bigint(11) NOT NULL auto_increment,
  3.   `ip` varchar(20) NOT NULL,
  4.   `page` varchar(120) NOT NULL,
  5.   `visitpage` varchar(120) NOT NULL,
  6.   `visitdate` varchar(25) NOT NULL,
  7.   `agent` varchar(200) NOT NULL,
  8.   `protocol` varchar(10) NOT NULL,
  9.   `url` varchar(120) NOT NULL,
  10.   `visitpath` varchar(80) NOT NULL,
  11.   PRIMARY KEY  (`id`)
  12. ) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

on every page I track, I start with

  1. session_start();
  2. log_hit();

The function log_hit() relies on the php $_SERVER array that holds all data the web server collects:

  1. function log_hit();
  2.         $ip   = $_SERVER['REMOTE_ADDR'];
  3.  
  4. //if it is my own ip, exit the routine
  5.  if($ip =="my.own.ip.here") return;
  6.  
  7. //what page is my visitor at ?
  8.  $page  = "http://".$_SERVER['HTTP_HOST'].$_SERVER['PHP_SELF'];
  9.  
  10. //what page did he come from ?
  11.  $visitpage  = $_SERVER['HTTP_REFERER'];
  12.  
  13.  $visitdate  = date("YmdHi");
  14.  
  15. //what user-string is it (iow: is it googlebot and pals or an actual visitor ?)
  16.  $agent      = $_SERVER["HTTP_USER_AGENT"];        
  17.  
  18. //the rest i dont use but i'll throw them in just for fun :
  19.  $protocol   = $_SERVER['SERVER_PROTOCOL'];
  20.  $method    = $_SERVER['REQUEST_METHOD'];
  21.  $url           = $_SERVER['REQUEST_URI'];
  22.  $visitpath   = parse_url($page, PHP_URL_PATH);
  23.  
  24.  if(!$ip) {} else {
  25. //if I have an Ip, register the hit :
  26. //connect to mysql table
  27.   $m=connect_data();
  28.  
  29. //and store it in the LOG table
  30.   $myqry="INSERT INTO `serp_log` (
  31.  `ip` ,`page` ,`visitpage` ,`visitdate` ,`agent` ,`protocol` ,`url` ,`visitpath` ) VALUES (
  32.  '".$ip."', '".$page."', '".$visitpage."', '".$visitdate."', '".$agent."', '".$protocol."', '".$url."', '".$visitpath."')";
  33.   mysql_query($myqry, $m) or die('store log :'.mysql_error());
  34.   mysql_close($m);
  35. }

the connect_data() function is standard mysql connection

  1. function connect_data() {
  2.   $DB_USER =  "user";
  3.   $DB_PASSWORD = "password";
  4.   $DB_HOST = "server";
  5.   $DB_DATA = "database";
  6.   $link =  mysql_connect($DB_HOST, $DB_USER, $DB_PASSWORD) or $error = mysql_error();
  7.   if (!$link) {
  8.       return $error;
  9.   }  
  10.      mysql_select_db($DB_DATA, $link) or $error = mysql_error();
  11.   return $link;
  12.  }

It is far from perfect, but it works. It helps check the crawl-rate of pages. You can filter out the useragent ‘Googlebot’ and run a query grouping useragent by date and see what times Googlebot comes by.

  1. //66.249.72.143 is GoogleBot
  2. check_traffic_by_IP('66.249.72.143' );
  3.  
  4. function check_traffic_by_IP($thisIP) {
  5.  $m = connect_data();
  6.  $qry = "SELECT * from `log` WHERE `ip` = '".$thisIP."' GROUP BY `visitdate`";
  7.  $lst = mysql_query($qry, $m);
  8.  echo '<table><tbody>';
  9.  while($row=mysql_fetch_assoc($lst)) {
  10.   echo '<tr><td>'.$row['visitdate'].'</td></tr>';
  11.  }
  12.  echo '</tbody></table>';
  13. }

…or exclude all records if the UserAgent indicates it was a Bot, and get a listing of organic beings visiting the pages :

  1. function check_real_traffic() {
  2.  $m = connect_data();
  3.  
  4. //the bots
  5.  $bots = array('googlebot', 'anonymouse');
  6.  
  7.  $qry = "SELECT * from `log`";
  8.  $lst = mysql_query($qry, $m);
  9.  
  10.  echo '<table><tbody>';
  11.  
  12.  while($row=mysql_fetch_assoc($lst)) {
  13.  
  14. //check if the entries in the bots-array
  15. //occur in the stored useragent string
  16.   $traffic=-1;
  17.   for($i=0;$i<count ($bots);$i++) {
  18.    if(strpos(strtolower($row['agent']), $bots[$i])>0) $traffic+=1;
  19.   }
  20.  
  21. //if it's not a bot (then the $traffic var is still -1), output the info
  22.   if($traffic==-1) {
  23.                echo '<tr><td>'.$row['visitdate'].'</td><td>'.$row['ip'].'</td><td>'.$row['page'].'</td></tr>';
  24.                 }
  25.  }
  26.  echo '</count></tbody></table>';
  27. }

…that shows ‘organic’ visitors.

other resources
tek-tips.com/ Limit Requests per day by IP

phpMyVisites open source free referrer and traffic analysis

Posted in mysql, php and tagged , , .

One Comment

  1. Hi, I was just wondering how you would manage a table getting really big really quickly…say if your site gets around 200k page views per month… I mean after 6 months you’d probably have around a million entries in that table…and it could all turn pretty ugly really quickly…

    Some tips on this would be greatly appreciated…

    Thanks
    Adriaan

Leave a Reply

Your email address will not be published. Required fields are marked *