Categories
google seo tips and tricks

google suggest scraper (php & simplexml)

Today’s goal is a basic php Google Suggest scraper because I wanted traffic data and keywords for free.

Before we start :

google scraping is bad !

Good People use the Google Adwords API : 25 cents for 1000 units, 15++ units for keyword suggestion so they pay 4 or 5 dollar for 1000 keyword suggestions (if they can find a good programmer which also costs a few dollars). Or they opt for SemRush (also my preference), KeywordSpy, Spyfu, and other services like 7Search PPC programs to get keyword and traffic data and data on their competitors but these also charge about 80 dollars per month for a limited account up to a few hundred per month for seo companies. Good people pay plenty.

We tiny grey webmice of marketing however just want a few estimates, at low or better no cost : like this :

data num queries
google suggest 57800000
google suggestion box 5390000
google suggest api 5030000
google suggestion tool 3670000
google suggest a site 72700000
google suggested users 57000000
google suggestions funny 37400000
google suggest scraper 62800
google suggestions not working 87100000
google suggested user list 254000000

Suggestion autocomplete is AJAX, it outputs XML :

< ?xml version="1.0"? >
   <toplevel>
     <CompleteSuggestion>
       <suggestion data="senior quotes"/>
       <num_queries int="30000000"/>
     </CompleteSuggestion>
     <CompleteSuggestion>
       <suggestion data="senior skip day lyrics"/>
       <num_queries int="441000"/>
     </CompleteSuggestion>
   </toplevel>

Using SimpleXML, the PHP routine is as simple as querying g00gle.c0m/complete/search?, grabbing the autocomplete xml, and extracting the attribute data :

 
        if ($_SERVER['QUERY_STRING']=='') die('enter a query like http://host/filename.php?query');
	$contentstring = @file_get_contents("http://g00gle.c0m/complete/search?output=toolbar&q=".urlencode($kw));  
  	$content = simplexml_load_string($contentstring );

        foreach($content->CompleteSuggestion as $c) {
            $term = (string) $c->suggestion->attributes()->data;
            //note : traffic data is sometimes missing   
            $traffic = (string) $c->num_queries->attributes()->int;
            echo $term. " ".$traffic . "
" ;
	}

I made a quick php script that outputs the terms as a list of new queries so you can walk through the suggestions :

The source is as text file up for download overhere (rename it to suggestit.php and it should run on any server with php5.* and simplexml).

Categories
google php

ga api sample : get pageviews

I was going to put that online : how to get the pageviews out of the google analytics api, using simplexml and php. Google use three namespaces in the output file which make it less easy accessible, so here’s a quick sample of how to get your sites pageviews out of it :

//ids           = site identifier (from the site data feed)
//metrics     = what i want to see
//start-date 
//end-date 

$feedUri = "https://www.google.com/analytics/feeds/data?ids=ga:10516419&metrics=ga:pageviews&start-date=2009-04-01&end-date=2009-05-01"; 			

	$curl = curl_init();
	curl_setopt($curl, CURLOPT_URL, $feedUri);
	curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 3);
	curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

       $headers[] = "Authorization: GoogleLogin auth=".$Authtoken;

//for authtoken : see previous post
	curl_setopt($curl, CURLOPT_HTTPHEADER, $headers); 
	curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
	curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
	curl_setopt($curl, CURLOPT_VERBOSE, 1);

//get the string containing the xml file
	$gA = curl_exec($curl);

the feed has three namespaces (atom, opensearch and dxp/analytics), a simple way is accessing the ENTRY tags (from the Atom namespace), in that tag is one DXP: line and that has the answer to the question.

<dxp:metric confidenceInterval=’0.0′ name=’ga:pageviews’ type=’integer’ value=’755’/>

//load the string into a simple xml object
	$feed = simplexml_load_string($gA);

//take the atom namespace
	$children =  $feed->children('http://www.w3.org/2005/Atom');

//take the entry tags
	$parts = $children->entry;
	foreach ($parts as $entry) {

        //from the entry tag,
        //access the dxp namespace
		$dxp = (object) $entry->children('http://schemas.google.com/analytics/2009');

        //METRIC contains the answer to the question
        //grab from the tag METRIC the attribute VALUE
                echo   (string) $dxp->metric->attributes()->value;

        }

Important is using the (string) typecast, normally simplexml returns a simplexml object, when you force a string type, it gives the actual metric ga:pageview value attribute as number.

Categories
google php

google analytics have an api !

[note: over at ioncannon Carson McDonald made a cool google analytics plugin for wordpress, i use it on this blog, works fine].

An actual google analytics api, and I missed out on it. This api is already a month old and i havent read anything on the blogs about it.

I found it half an hour ago, I havent checked it completely but it looks promising. Here is the first bit, basic authentication with php and curl.

$USER_EMAIL=""; // #Insert your Google Account email here
$USER_PASS=""; //#Insert your password here

//array with some general data
$data = array(
  "Email" => $USER_EMAIL,
  "Passwd" => $USER_PASS, 
  "accountType" => "GOOGLE", 
  "source" => "curl-accountFeed-v1",
  "service" => "analytics"
);

$friends_url = 'https://www.google.com/accounts/ClientLogin';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $friends_url);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 3);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

//http-post that contains the array as data
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $data);

//go shove the https secure connection verification
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

curl_setopt($curl, CURLOPT_VERBOSE, 1);
			

$googleAuth = curl_exec($curl);

//optional : some feedback

//check if we get an error code from cUrl
//    echo curl_errno($curl)."
"; // echo curl_error($curl)."
" ; //print the body of the returned data // print_r($googleAuth); //print all the headers // $info = curl_getinfo($curl); // print_r($info);

somewhere in the garbled mess that curl returns is the Authorization token, starts with auth=.

$start = strpos($googleAuth, "Auth=") + 5;
$Authtoken = substr($googleAuth, $start);

//echo $Authtoken;

I put that token in the header of the next calls and google assumes I am kosher : time to get the accounts feed :

//add the authoritzation token as extra header
$headers[] = "Authorization: GoogleLogin auth=".$Authtoken;


$friends_url = 'https://www.google.com/analytics/feeds/accounts/default';

	$curl = curl_init();
	curl_setopt($curl, CURLOPT_URL, $friends_url);
	curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 3);
	curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($curl, CURLOPT_HTTPHEADER, $headers); 
	curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
	curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
	curl_setopt($curl, CURLOPT_VERBOSE, 1);
	$googleAccounts = curl_exec($curl);

//check errors
echo curl_errno($curl);
echo curl_error($curl) ;
print_r($googleAccounts);

And there it is : a whole list with weird codes, my account list :) seems easier than the other gData api’s.

note : the google code curl example does not show the ” auth=” part of the token, they assume you use the entire line “auth=…” as token.

Once I have my spectacular visitor count in a sidebar widget I’ll blog another post on this one.