google trends III

How to get the urls and snippets from the Google Trends details page. The news articles on the details page are listed with an ‘Ajax’ call, they are not sent to the browser in the html source. No easy way to scrape that.

The blog articles are pretty straight forward : first the ugly fast way :

  1. $mytitle='manuel benitez';
  2. $mydate=''; //2008-12-24
  3. $html=file_get_contents('http://www.google.com/trends/hottrends?q='.urlencode($mytitle).'&date=&sa=X');
  4. $start = strpos($html, '<div class="gsc-resultsbox-visible">');
  5. $end = strpos($html, '<div class="gsc-trailing-more-results">');
  6. $content = substr($html, $start, $end$start);
  7. echo $content;
  8. </div></div>

That returns the blog snippets, ugly. The other way : regular pattern matching : you can grab the divs that each content item has, marked with

  • div class=”gs-title”
  • div class=”gs-relativePublishedDate”
  • div class=”gs-snippet”
  • div class=”gs-visibleUrl”

from the html-source and organize them as “Content” array, after which you can list the content items with your own markup or store them in a database.

  1. //I assume $mytitle is taken from the $_GET array.
  2.  
  3. //array 'Content' with it's members
  4. Class Content {
  5.  var $id;
  6.  var $title;
  7.  var $pubdate;
  8.  var $snippet;
  9.  var $url;
  10.  
  11.  public function __construct($id) {
  12.   $this->id=$id;
  13.  }
  14. }
  15.  
  16. //grab the source from the google page
  17. $html=file_get_contents('http://www.google.com/trends/hottrends?q='.urlencode($mytitle).'&date=&sa=X');
  18.  
  19. //cut out the part I want
  20. $start = strpos($html, '<div class="gsc-resultsbox-visible">');
  21. $end = strpos($html, '<div class="gsc-trailing-more-results">');
  22. $content = substr($html, $start, $end$start);
  23.  
  24. //grab the divs that contain title, publish date, snippet and url with regular pattern match
  25. preg_match_all('!<div class=\”gs-title\”>.*?< \/div>!si', $html, $titles);
  26. preg_match_all('!<div class=\”gs-relativePublishedDate\”>.*?< \/div>!si', $html, $pubDates);
  27. preg_match_all('!<div class=\”gs-snippet\”>.*?< \/div>!si', $html, $snippets);
  28. preg_match_all('!<div class=\”gs-visibleUrl\”>.*?< \/div>!si', $html, $urls);
  29.  
  30. $Contents = array();
  31.  
  32. //organize them under Content;
  33.  
  34. $count=0;
  35. foreach($titles[0] as $title) {
  36. //make a new instance of Content;
  37.  $Contents[] = new Content($count);
  38. //add title
  39.  $Contents[$count]->title=$title;
  40.  $count++;
  41. }
  42.  
  43. $count=0;
  44. foreach($pubDates[0] as $pubDate) {
  45. //add publishing date (contains some linebreak, remove it with strip_tags)
  46.  $Contents[$count]->pubdate=strip_tags($pubDate);
  47.  $count++;
  48. }
  49.  
  50. $count=0;
  51. foreach($snippets[0] as $snippet) {
  52. //add snippet
  53.  $Contents[$count]->snippet=$snippet;
  54.  $count++;
  55. }
  56.  
  57. $count=0;
  58. foreach($urls[0] as $url) {
  59. //add display url
  60.  $Contents[$count]->url=$url;
  61.  $count++;
  62. }
  63.  
  64. //leave $count as is, the number of content-items with a 0-base array
  65. //add rel=nofollow to links to prevent pagerank assignment to blogs
  66. for($ct=0;$ct< $count;$ct++) {
  67.  $Contents[$ct]->url = preg_replace('/ target/', ' rel="nofollow" target', $Contents[$ct]->url);
  68.  $Contents[$ct]->title = preg_replace('/ target/', ' rel="nofollow" target', $Contents[$ct]->title);
  69. }
  70.  
  71. //its complete, list all content-items with some markup
  72. for($ct=0;$ct< $count;$ct++) {
  73.  echo '<h3>'.$Contents[$ct]->title.'';
  74.  echo '<p><strong>'.$Contents[$ct]->pubdate.'</strong>:<em>'.$Contents[$ct]->snippet.'</em></p>';
  75.  echo $Contents[$ct]->url.'<br />';
  76. }
  77. </div></div></div></div></div></div>

It ain’t perfect, but it works. the highlighter I use gets a bit confused about the preg_match_all statements containing unclosed div’s, so copying the code of the blog may not work.

Posted in google and tagged , , .

6 Comments

  1. Pingback: SEO underWorld » Blog Archive » 9 Epic SEO Scripts

  2. Hi mate,

    Meet you again here :)

    This code work perfectly now but maybe you can add some features like keyword search result form for searching keyword trends we need and then display result like this:

    hxxp:example.com/keyword+results.html

    or if in keyword typed “google trends today”

    will become:

    hxxp:example.com/google+trends+today.html

    hoped you can make search form code if do you have time and share it for us…

    My Regards

  3. Great script and thanks for sharing it. Not really sure how efficient this script is when ran on a large amount of keywords, but its a start.

Leave a Reply

Your email address will not be published. Required fields are marked *