google trends II

I wanted to reply to a question elsewhere on the site, but a ‘comment’ box isn’t fit for it so I’ll put the reply here. The question was about creating ‘search engine friendly’ descriptive URL’s based on keywords from the Google Trends atom feed, listing pages a graph of the trend.

You can get a site to list http://domain.com/trend_title.html type url’s by using mod_rewrite, an apache module.

In the server directory of the application you can use an .htaccess file to set rules for file access in these folders. When the server gets request from browsers or servers it applies any rewriting rules you define in .htaccess to these requests.

I tried this one :


	RewriteEngine On
	RewriteCond %{REQUEST_FILENAME} !-f
	RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule ^(.*).html /trendinfo.php?title=$1 

RewriteEngine On
sets the rewrite mechanism on

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

tell the apache server that rewriteconditions apply to file-requests that are not an existing file (F) or directory (D). If the requested filename is anywhere in the servers file table, the server dishes out that file, otherwise it will try to apply a RewriteRule. Applying the rule generates a new request, if that returns anything, the server dishes that out, otherwise it returns an htpp-404 ‘file not found’.

The actual url rewrite rule is :
RewriteRule ^(.*).html /trendinfo.php?title=$1
which means :

  • if any filename is requested that satisfies the mask ^(.*).html then
  • take everything before .html
  • add that as variable $1 to trendinfo.php?title=$1
  • see if it sticks

If the browser requests http://domain.com/bob+bowersox.html, the server will assert it is not a file or directory on the server, and test the available rules. When it notices it the requested file ends with .html, it applies the rewrite rule and tries to access http://domain.com/trendinfo.php?title=bob+bowersox.

A browsing user does not notice a thing.

In trendinfo.php I wrote some code to handle the ‘new’ request :

if(!isset($_REQUEST['title'])) {
//if there is no $1, added as title, fake a 404 "file not found" message 
        echo 'the emptiness...';
} else {
//get the title from the request
  $mytitle=htmlentities($_REQUEST['title'], ENT_QUOTES, "UTF-8");
//put the google trends graph url together
  $graphurl = 'http://www.google.com/trends/viz?hl=&q=';
  $graphurl .= urlencode($mytitle);
  $graphurl .= '&date=';                        //leave date blank to get the current graph
  $graphurl .= '&graph=hot_img&sa=X';
  echo "";
}

…that outputs the Google trend graph on the url http://domain.com/bob+bowersox.html

You can also put this in index.php :

		$feed = simplexml_load_file('http://www.google.com/trends/hottrends/atom/hourly');
		$children =  $feed->children('http://www.w3.org/2005/Atom');
		$parts = $children->entry;
		foreach ($parts as $entry) {
		  	$details = $entry->children('http://www.w3.org/2005/Atom');
	 	 	 $dom = new domDocument(); 
		 	 $html=$details->content;
		 	 @$dom->loadHTML($html); 
		  	 $anchors = $dom->getElementsByTagName('a'); 
				foreach ($anchors as $anchor) { 
		 			$url = $anchor->getAttribute('href'); 
	 				$urltext = $anchor->nodeValue;
					echo ''.$urltext.' ';
				}
			}
			unset($dom);
			unset($anchors);
			unset($parts);
			unset($feed);

That lists the current 100 google trends with a link. If you use the .htaccess rewrite rules, the server reroutes all the links to trendinfo.php with descriptive urls.

I hope that helps.

an xml rpc endpoint

(geek content:) Integrating the Incutio xml rpc class into a phpLinkDirectory install opens a lot of possibilities for running remote control automated networks. For a basic example I took the submit routine of phpLD and the xml-rpc routine from wordpress, deleted all nonsense, and ended up with a simple xml-rpc endpoint for my link directory.

On the sender side, I make an xml file that holds the methodName (which is the function I want to execute remotely : phpld.SubmitLink), and the array values I want to pass to the function as parameters. I attach the xml-string as post to a curl call and fire it at the xmlrpc endpoint.

function getmyxml() {

//make the $data array
//normally phpLd makes it when someone submits a site
	$data['LINK_TYPE']=1;
	$data['DESCRIPTION']='DESCRIPTION';
	$data['TITLE']='TITLE';
	$data['OWNER_NAME']='OWNER_NAME';
	$data['URL']='http://www.domain.com/xmlrpc.php';
	$data['ID']='';

//put the data array in an xml string to post to the xmlrpc endpoint

//make the xml header
	$myxml='< ?xml version="1.0" encoding="UTF-8"?>';
	$myxml.='< methodCall>';
	$myxml.=	'< methodName>phpld.SubmitLink< /methodName>';
	$myxml.='< params>';
	
//loop to add the $data elements as param-tags
	foreach($data as $d) {
		$myxml.='< param>';
		$myxml.='< value>< string>'.trim($d).'< /string>< /value>';
		$myxml.='< /param>';
	}	

//finish the xml file :
	$myxml.='< /params>';
	$myxml.='< /methodCall>';

//return it
	return $myxml;
}

//make the call to the endpoint 
    $ch = curl_init('http://www.domain.com/xmlrpc.php');
//use content-type text/xml as extra header
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: text/xml'));
//get the xml-string and use it as post var
    curl_setopt($ch, CURLOPT_POSTFIELDS, getmyxml());
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 1);
    $return = curl_exec($ch);
    unset($ch);
}

So much for the sending end, now the receiving end : the xmlrpc.php endpoint file. I need the Incutio xml handler and the phpLinkdirectory database class, with the proper settings, and a function SubmitLink to add the data in the posted xml-file as a record to the database.

Some general settings :

define('XMLRPC_REQUEST', true);
$_COOKIE = array();
if ( !isset( $HTTP_RAW_POST_DATA ) ) $HTTP_RAW_POST_DATA = file_get_contents( 'php://input' );
if ( isset($HTTP_RAW_POST_DATA) )  $HTTP_RAW_POST_DATA = trim($HTTP_RAW_POST_DATA);

I include the Incutio class file, which is a general xml client/server class, and I include init.php from the phpLD script. In the init.php script the AdoDb database instance is declared, that handles the mysql connection, as well as the basic phpLd environment.

	
include_once('include/class-IXR.php');
include_once('init.php');
$xmlrpc_logging = 0;

Let’s put them both to work for me : the IXR_Server has to know it has to run the SubmitLink function if that function is set as methodName in the xml file. In php you can add methods dynamically to a class, and in the function call($methodname, $args) the IXR_Server handles callbacks.

In my phpLD_xmlrpc_server class, which extends IXR_Server, I add a function SubmitLink that stuffs the data in the phpld database, and I make an array with xml.methodName=>class:function as key=>value pair and pass that to the IXR_Server, that’s enough.

class phpLD_xmlrpc_server extends IXR_Server {

	var $methods=array();

	function phpLD_xmlrpc_server() {

//mapping the custom methods
	   $this->methods['phpld.SubmitLink'] = 'this:SubmitLink';

//handing the custom methods to the base class
	   $this->IXR_Server($this->methods);
	}

When the IXR class parses the posted xml and finds the phpld.SubmitLink methodName, it executes the custom method ($this->)SubmitLink with the data posted in the xml param tags as input :


function SubmitLink($params) {

//use db and tables from include(init.php)
	global $db;
	global $tables;

//map the param-values passed to the $data array 
	$data['LINK_TYPE']=$params[0];
	$data['DESCRIPTION']=$params[1];
	$data['TITLE']=$params[2];
	$data['OWNER_NAME']=$params[3];
	$data['URL']=$params[4];
	$data['ID']='';
		
//pass the data array to the adodb $db instance's Replace function : 
		if ($db->Replace($tables['link']['name'], $data, 'ID', true) > 0) {
			return $data['DESCRIPTION']." entered";
		} else {
			return " refused";
		}
	}

The AdoDb function Replace maps key-value to tablefield-value, so I use the database field names as keys for the $data array, and assign the $param-values to them. I pass the $data array to the adoDb function and that takes care of the rest.

After adding the includes, settings, and classes,
all I have to do is add a final call at the end to start a new instance of the extended class to handle the posted xml-data :

$phpLD_xmlrpc_server = new phpLD_xmlrpc_server();

…and I have an xml-rpc endpoint :

define('XMLRPC_REQUEST', true);
$_COOKIE = array();
if ( !isset( $HTTP_RAW_POST_DATA ) ) $HTTP_RAW_POST_DATA = file_get_contents( 'php://input' );
if ( isset($HTTP_RAW_POST_DATA) ) $HTTP_RAW_POST_DATA = trim($HTTP_RAW_POST_DATA);
	
include_once('include/class-IXR.php');
include_once('init.php');

$xmlrpc_logging = 0;

class phpLD_xmlrpc_server extends IXR_Server {

	var $methods=array();
	
	function phpLD_xmlrpc_server() {
	   $this->methods['phpld.SubmitLink'] = 'this:SubmitLink';
		$this->IXR_Server($this->methods);
	}

	function SubmitLink($args) {

	global $db;
	global $tables;

	$data['STATUS']         = $args[0];
	$data['IPADDRESS']      = $args[1];
	$data['VALID']          = $args[2];
	$data['LINK_TYPE']      = $args[3];

		if ($db->Replace($tables['link']['name'], $data, 'ID', true) > 0) {
			return $data['DESCRIPTION']." entered";
		} else {
			return " refused";
		}
	}
}
$phpLD_xmlrpc_server = new phpLD_xmlrpc_server();

Short and sweet.

The actual strength of xml-rpc is in defining a set of standard methodNames and parameters passed with the methodCall (for instance for small website maintenance tasks and statistics reporting, an rpc.sms protocol as successor to Twitter, or standard socialgraph functions for ajax/javascript/widgets) for developing standard API’s.

google trends crosstab

I was playing with Google Trends using another blog. Of the seven or eight posts I tried, five made it to the details page, which gave me some traffic. I am too lazy to follow 100 entries during the day, so I want a simple report on how trends develop the past day like this :

google trends watch

Now how do I get that done ?

First I make a cronjob that runs a routine to pull the Google trends xml-feed in every hour and store it in a database. I add a string for the period (year-month-day-hour).

That gives me a table like this one :

DATE SEARCHID POS
2008120101 baboon 78
2008120101 monkey 13
2008120102 baboon 98
2008120102 monkey 5
2008120103 monkey 3

I want an output

2008120101 2008120102 2008120103
baboon 78 98
monkey 13 5 3

The way to do that is a crosstable query, also called a pivot table .

MySql pivot table

I have to make a routine to turn that base table into a table with the dates as column, and per phrase the position in the correct column. I want a complete row, so I take the highest period from the table and the lowest, and from that intrapolate the other periods.

function make_periods($begin, $end) {
$BeginTime=mktime(substr($begin, 8, 2),0,0, substr($begin,4,2), substr($begin,6,2), substr($begin,0,4));
$EndTime=mktime(substr($end,8,2),0,0, substr($end,4,2), substr($end,6,2), substr($end,0,4));
//divide the difference by 60minutes*60seconds
$periods = ($EndTime-$BeginTime) / 3600;
//make a row of hour-periods with "+N hour",
for($i=0;$i< ($periods+1);$i++) $myperiods[] = strftime('%Y%m%d%H', strtotime("+$i hour", $BeginTime));
//return an array with all periods 
//that are to be the column headers in the crosstable
	return $myperiods;
}

At some point I have to clean up the table, and I want the data to fit in one page,
so I will use 24 periods

$plength=25;
if(count($dates)>$plength) {
	for($j=0;$j< ($plength+1);$j++) $usedates[] = $dates[count($dates) - ($plength-$j)];
} else {
//note : this one needs some work.
     $usedates[]=$dates;
}

... and delete all older records :

//make the periods
$dates=array();
$dates = make_periods($begin, $end);

//use the $dates array, see if I have more than 24 periods
if(count($dates) > 24) {
//get period number-last-minus-24
//delete all records before that period
  mysql_query("DELETE FROM `serp_trends`  WHERE `date`< '".$dates[count($dates)-24] ."'", $link);
}

...that keeps the table small enough for quick querying. I have 24 periods to make columns, how do I get the position as value in each column ? I read the 'wizard' article on mysql pivot tables and that sounded difficult.

I don't get it, but I got it running :

  • group the records by phrase ("searchid")
  • use the array of periods as columns
  • see if the phrase has a position in a specific period,
    • if so, take the position as value (it only has one per period)
    • otherwise take 0 as value
  • sum the values
  • name the column "T"+period
$usedates=$dates;
$CROSTAB = "SELECT `searchid`, AVG(`pos`) AS AP ";
	for($j=0;$j

I take the average position from the grouped records, order it ascending and output that as table, first the two columns phrase (searchid) and average position, then the periods as column names.

$link=connect_data();

$result=mysql_query($CROSTAB, $link) or die(mysql_error());

//add the zebra class marker for mootool
$CTB.= '';

//make the header row
$CTB.='';

for($j=0;$j';
	for($j=0;$j'; }
	}
	$CTB.='';
}
$CTB.='
avgsearch'.$row['searchid'].'
';

When I echo $CTB, the crosstable, this comes out :

google trends watch

Seeing rows per phrase means I can check a lot easier which trends are running, rising, dropping, or even returning. And that was what I wanted.

I zipped the files for download.