google trends III

How to get the urls and snippets from the Google Trends details page. The news articles on the details page are listed with an ‘Ajax’ call, they are not sent to the browser in the html source. No easy way to scrape that.

The blog articles are pretty straight forward : first the ugly fast way :

$mytitle='manuel benitez';
$mydate=''; //2008-12-24
$html=file_get_contents('http://www.google.com/trends/hottrends?q='.urlencode($mytitle).'&date=&sa=X');
$start = strpos($html, '
'); $end = strpos($html, '
'); $content = substr($html, $start, $end-$start); echo $content;

That returns the blog snippets, ugly. The other way : regular pattern matching : you can grab the divs that each content item has, marked with

  • div class=”gs-title”
  • div class=”gs-relativePublishedDate”
  • div class=”gs-snippet”
  • div class=”gs-visibleUrl”

from the html-source and organize them as “Content” array, after which you can list the content items with your own markup or store them in a database.

//I assume $mytitle is taken from the $_GET array.

//array 'Content' with it's members 
Class Content {
	var $id;
	var $title;
	var $pubdate;
	var $snippet;
	var $url;
	
	public function __construct($id) {
		$this->id=$id;
	}
}

//grab the source from the google page
$html=file_get_contents('http://www.google.com/trends/hottrends?q='.urlencode($mytitle).'&date=&sa=X');

//cut out the part I want
$start = strpos($html, '
'); $end = strpos($html, '
'); $content = substr($html, $start, $end-$start); //grab the divs that contain title, publish date, snippet and url with regular pattern match preg_match_all('!
.*?< \/div>!si', $html, $titles); preg_match_all('!
.*?< \/div>!si', $html, $pubDates); preg_match_all('!
.*?< \/div>!si', $html, $snippets); preg_match_all('!
.*?< \/div>!si', $html, $urls); $Contents = array(); //organize them under Content; $count=0; foreach($titles[0] as $title) { //make a new instance of Content; $Contents[] = new Content($count); //add title $Contents[$count]->title=$title; $count++; } $count=0; foreach($pubDates[0] as $pubDate) { //add publishing date (contains some linebreak, remove it with strip_tags) $Contents[$count]->pubdate=strip_tags($pubDate); $count++; } $count=0; foreach($snippets[0] as $snippet) { //add snippet $Contents[$count]->snippet=$snippet; $count++; } $count=0; foreach($urls[0] as $url) { //add display url $Contents[$count]->url=$url; $count++; } //leave $count as is, the number of content-items with a 0-base array //add rel=nofollow to links to prevent pagerank assignment to blogs for($ct=0;$ct< $count;$ct++) { $Contents[$ct]->url = preg_replace('/ target/', ' rel="nofollow" target', $Contents[$ct]->url); $Contents[$ct]->title = preg_replace('/ target/', ' rel="nofollow" target', $Contents[$ct]->title); } //its complete, list all content-items with some markup for($ct=0;$ct< $count;$ct++) { echo '

'.$Contents[$ct]->title.''; echo '

'.$Contents[$ct]->pubdate.':'.$Contents[$ct]->snippet.'

'; echo $Contents[$ct]->url.'
'; }

It ain’t perfect, but it works. the highlighter I use gets a bit confused about the preg_match_all statements containing unclosed div’s, so copying the code of the blog may not work.

Posted in google and tagged , , .

5 Comments

  1. Pingback: SEO underWorld » Blog Archive » 9 Epic SEO Scripts

  2. Hi mate,

    Meet you again here :)

    This code work perfectly now but maybe you can add some features like keyword search result form for searching keyword trends we need and then display result like this:

    hxxp:example.com/keyword+results.html

    or if in keyword typed “google trends today”

    will become:

    hxxp:example.com/google+trends+today.html

    hoped you can make search form code if do you have time and share it for us…

    My Regards

  3. Great script and thanks for sharing it. Not really sure how efficient this script is when ran on a large amount of keywords, but its a start.

Leave a Reply

Your email address will not be published. Required fields are marked *