what is my website worth ? (php fetching)

What is the worth of my puny blog ? Here’s two snippets (download) to grab estimates from two sites. The first one is business-opportunities.biz, the second webvaluer.org.

According to www.business-opportunities.biz my blog is worth 1700 dollar. Their estimate is derived from the Technorati blog cosmos so it only works for blogs that are registered there.

function what_my_site_is_worth_according_to_business_opportunities($url)
{
    $html=file_get_contents('http://www.business-opportunities.biz/projects/how-much-is-your-blog-worth/submit/?url='.$url);
    $cut1=explode('is worth',$html);
    $cut2=explode('< /p>',$cut1[1]);
    $result=trim($cut2[0]);
    if(strlen($result)==0) return(0);
    return($result);
} 

use

echo what_my_site_is_worth_according_to_business_opportunities('www.juust.org');

That returns in my case $1,693.62

According to webvaluer.org my website is worth $7,549 US dollar. Their site returns a value for every type of site. In the page source there is a span format tag around the text

< h1>juust.org Estimated Worth < span style="color:green; font-weight:bold;">$7,549 USD< /span>< /h1>

I cut it before and after the span tag, and use the php function strip_tags() to remove the span tag, and only the text string remains.

function what_my_blog_is_worth_according_to_webvaluer($url)
{
    $html=file_get_contents('http://www.webvaluer.org/us/'.$url);
    $cut1=explode('Estimated Worth', $html);
    $cut2=explode('< /h1>',$cut1[1]);
    $result=(trim($cut2[0]));
    if(strlen($result)==0) return(0);
    return(strip_tags($result));
} 

use :

echo what_my_blog_is_worth_according_to_webvaluer('www.juust.org');

How to make your own php fetch function :

  • check the result page html source
  • see what tag or phrase comes before the text you want (‘Estimated Worth’)
  • use explode() or split() to cut the source up in pieces on that phrase
  • take the piece after the phrase (arrays are 0-based,use cut1[1] to get the second piece)
  • cut that up at the first tag (‘< /h1>‘) after the text you want
  • use the first piece (cut2[0]) that holds the text you want

It’s easier to use source splitting for this sort of fetching than regular expressions, which is sooner meant to retrieve generic patterns (email addresses and stuff) from lots of pages (preg_match()), or an array of similar string from one page (preg_match_all(), for serp pages or report tables).

note : my php code highlighter refuses to display html tags in functions proper so in the code I added a blank space : $cut2=explode(‘< /p>‘,$cut1[1]);
Remove the blanks from the html tag in the function for it to work proper or use the text source
.

how to keep a traffic log with mysql and php

A rather obsolete post. I hacked a quick referrer tracking script together because I wanted to see how Googlebot indexed my serp tool pages. I can use Google analytics or phpMyVisites, but that is not as rewarding as keeping my own logs.

I start with a mysql table LOG to store the traffic data

CREATE TABLE IF NOT EXISTS `log` (
  `id` bigint(11) NOT NULL auto_increment,
  `ip` varchar(20) NOT NULL,
  `page` varchar(120) NOT NULL,
  `visitpage` varchar(120) NOT NULL,
  `visitdate` varchar(25) NOT NULL,
  `agent` varchar(200) NOT NULL,
  `protocol` varchar(10) NOT NULL,
  `url` varchar(120) NOT NULL,
  `visitpath` varchar(80) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

on every page I track, I start with

session_start();
log_hit();

The function log_hit() relies on the php $_SERVER array that holds all data the web server collects:

function log_hit();
        $ip   = $_SERVER['REMOTE_ADDR'];

//if it is my own ip, exit the routine
	if($ip =="my.own.ip.here") return;

//what page is my visitor at ?
	$page  = "http://".$_SERVER['HTTP_HOST'].$_SERVER['PHP_SELF'];

//what page did he come from ?
	$visitpage  = $_SERVER['HTTP_REFERER'];

	$visitdate  = date("YmdHi");

//what user-string is it (iow: is it googlebot and pals or an actual visitor ?)
	$agent      = $_SERVER["HTTP_USER_AGENT"];        

//the rest i dont use but i'll throw them in just for fun :
	$protocol   = $_SERVER['SERVER_PROTOCOL'];
	$method    = $_SERVER['REQUEST_METHOD'];
	$url           = $_SERVER['REQUEST_URI'];
	$visitpath   = parse_url($page, PHP_URL_PATH);

	if(!$ip) {} else {
//if I have an Ip, register the hit :
//connect to mysql table 
		$m=connect_data();

//and store it in the LOG table
		$myqry="INSERT INTO `serp_log` (
		`ip` ,`page` ,`visitpage` ,`visitdate` ,`agent` ,`protocol` ,`url` ,`visitpath` ) VALUES (
		'".$ip."', '".$page."', '".$visitpage."', '".$visitdate."', '".$agent."', '".$protocol."', '".$url."', '".$visitpath."')";
		mysql_query($myqry, $m) or die('store log :'.mysql_error());
		mysql_close($m);
}

the connect_data() function is standard mysql connection

function connect_data() {
		$DB_USER =  "user";
		$DB_PASSWORD = "password";
		$DB_HOST = "server";
		$DB_DATA = "database";
		$link =  mysql_connect($DB_HOST, $DB_USER, $DB_PASSWORD) or $error = mysql_error();
		if (!$link) {
	    	return $error; 
		} 	 
	    mysql_select_db($DB_DATA, $link) or $error = mysql_error();
		return $link;
	}

It is far from perfect, but it works. It helps check the crawl-rate of pages. You can filter out the useragent ‘Googlebot’ and run a query grouping useragent by date and see what times Googlebot comes by.

//66.249.72.143 is GoogleBot
check_traffic_by_IP('66.249.72.143' );

function check_traffic_by_IP($thisIP) {
	$m = connect_data();
	$qry = "SELECT * from `log` WHERE `ip` = '".$thisIP."' GROUP BY `visitdate`";
	$lst = mysql_query($qry, $m);
	echo '';
	while($row=mysql_fetch_assoc($lst)) {
		echo '';
	}
	echo '
'.$row['visitdate'].'
'; }

…or exclude all records if the UserAgent indicates it was a Bot, and get a listing of organic beings visiting the pages :

function check_real_traffic() {
	$m = connect_data();

//the bots
	$bots = array('googlebot', 'anonymouse');

	$qry = "SELECT * from `log`";
	$lst = mysql_query($qry, $m);

	echo '';

	while($row=mysql_fetch_assoc($lst)) {

//check if the entries in the bots-array
//occur in the stored useragent string 
		$traffic=-1;
		for($i=0;$i0) $traffic+=1;
		} 

//if it's not a bot (then the $traffic var is still -1), output the info
		if($traffic==-1) {
               echo '';
                }
	}
	echo '
'.$row['visitdate'].''.$row['ip'].''.$row['page'].'
'; }

…that shows ‘organic’ visitors.

other resources
tek-tips.com/ Limit Requests per day by IP

phpMyVisites open source free referrer and traffic analysis

parsing the google trends atom feed

Short and sweet : how to grab the google trends atom feed and parse out the links,
in 15 lines.

$feed = simplexml_load_file('http://www.google.com/trends/hottrends/atom/hourly');
$children =  $feed->children('http://www.w3.org/2005/Atom');
$parts = $children->entry;
foreach ($parts as $entry) {
	  $details = $entry->children('http://www.w3.org/2005/Atom');
	  $dom = new domDocument(); 
	  @$dom->loadHTML($details->content); 
	  $anchors = $dom->getElementsByTagName('a'); 
		foreach ($anchors as $anchor) { 
 			$url = $anchor->getAttribute('href'); 
 			$urltext = $anchor->nodeValue; 
 			echo 'Link: ' . $urltext . ' '; 
		}
}

Requires php with simplexml and dom xml. You could use it for a blogfarm script but that’s about all I can think of.

edited 18-12-08 :
$dom->loadHTML((string) $details->content);
to
@$dom->loadHTML($details->content);