How to get the urls and snippets from the Google Trends details page. The news articles on the details page are listed with an ‘Ajax’ call, they are not sent to the browser in the html source. No easy way to scrape that.
The blog articles are pretty straight forward : first the ugly fast way :
-
$mytitle='manuel benitez';
-
$mydate=''; //2008-12-24
-
$html=file_get_contents('http://www.google.com/trends/hottrends?q='.urlencode($mytitle).'&date=&sa=X');
-
$start = strpos($html, '<div class="gsc-resultsbox-visible">');
-
$end = strpos($html, '<div class="gsc-trailing-more-results">');
-
$content = substr($html, $start, $end–$start);
-
echo $content;
-
</div></div>
That returns the blog snippets, ugly. The other way : regular pattern matching : you can grab the divs that each content item has, marked with
- div class=”gs-title”
- div class=”gs-relativePublishedDate”
- div class=”gs-snippet”
- div class=”gs-visibleUrl”
from the html-source and organize them as “Content” array, after which you can list the content items with your own markup or store them in a database.
-
//I assume $mytitle is taken from the $_GET array.
-
-
//array 'Content' with it's members
-
Class Content {
-
var $id;
-
var $title;
-
var $pubdate;
-
var $snippet;
-
var $url;
-
-
public function __construct($id) {
-
$this->id=$id;
-
}
-
}
-
-
//grab the source from the google page
-
$html=file_get_contents('http://www.google.com/trends/hottrends?q='.urlencode($mytitle).'&date=&sa=X');
-
-
//cut out the part I want
-
$start = strpos($html, '<div class="gsc-resultsbox-visible">');
-
$end = strpos($html, '<div class="gsc-trailing-more-results">');
-
$content = substr($html, $start, $end–$start);
-
-
//grab the divs that contain title, publish date, snippet and url with regular pattern match
-
preg_match_all('!<div class=\”gs-title\”>.*?< \/div>!si', $html, $titles);
-
preg_match_all('!<div class=\”gs-relativePublishedDate\”>.*?< \/div>!si', $html, $pubDates);
-
preg_match_all('!<div class=\”gs-snippet\”>.*?< \/div>!si', $html, $snippets);
-
preg_match_all('!<div class=\”gs-visibleUrl\”>.*?< \/div>!si', $html, $urls);
-
-
$Contents = array();
-
-
//organize them under Content;
-
-
$count=0;
-
foreach($titles[0] as $title) {
-
//make a new instance of Content;
-
$Contents[] = new Content($count);
-
//add title
-
$Contents[$count]->title=$title;
-
$count++;
-
}
-
-
$count=0;
-
foreach($pubDates[0] as $pubDate) {
-
//add publishing date (contains some linebreak, remove it with strip_tags)
-
$Contents[$count]->pubdate=strip_tags($pubDate);
-
$count++;
-
}
-
-
$count=0;
-
foreach($snippets[0] as $snippet) {
-
//add snippet
-
$Contents[$count]->snippet=$snippet;
-
$count++;
-
}
-
-
$count=0;
-
foreach($urls[0] as $url) {
-
//add display url
-
$Contents[$count]->url=$url;
-
$count++;
-
}
-
-
//leave $count as is, the number of content-items with a 0-base array
-
//add rel=nofollow to links to prevent pagerank assignment to blogs
-
for($ct=0;$ct< $count;$ct++) {
-
$Contents[$ct]->url = preg_replace('/ target/', ' rel="nofollow" target', $Contents[$ct]->url);
-
$Contents[$ct]->title = preg_replace('/ target/', ' rel="nofollow" target', $Contents[$ct]->title);
-
}
-
-
//its complete, list all content-items with some markup
-
for($ct=0;$ct< $count;$ct++) {
-
echo '<h3>'.$Contents[$ct]->title.'';
-
echo '<p><strong>'.$Contents[$ct]->pubdate.'</strong>:<em>'.$Contents[$ct]->snippet.'</em></p>';
-
echo $Contents[$ct]->url.'<br />';
-
}
-
</div></div></div></div></div></div>
It ain’t perfect, but it works. the highlighter I use gets a bit confused about the preg_match_all statements containing unclosed div’s, so copying the code of the blog may not work.
Pingback: SEO underWorld » Blog Archive » 9 Epic SEO Scripts
Hi mate,
Meet you again here :)
This code work perfectly now but maybe you can add some features like keyword search result form for searching keyword trends we need and then display result like this:
hxxp:example.com/keyword+results.html
or if in keyword typed “google trends today”
will become:
hxxp:example.com/google+trends+today.html
hoped you can make search form code if do you have time and share it for us…
My Regards
Great script and thanks for sharing it. Not really sure how efficient this script is when ran on a large amount of keywords, but its a start.
Found many topics about this and now I need some more information about it.
Nice follow up. ^_^
Very useful source, thanks.