juust ~ php oddities

Unordered list of one element
  • rss
  • begin
  • about
    • vcard
    • WTF is BroJesus
  • php scripts
    • flickr wp widget
    • google multi key serp tool, php script
    • gwt plugin
  • php classes
    • php pagerank class
    • fibonacci class
    • robots.txt parser php class
  • serp
    • serp dashboard wordpress plugin
  • services

bing api with php and simplexml

juust | 17/09/2009

About scraping results off of Bing : Bing use a set of about eight cookies. You can grab 200 results with php curl, as 20 pages of 10, but after the first 200 the Bing server checks for the cookie and for lack of one returns a blank page. I can fidget with the curl cookiejar, but Bing also offer a straighforward API.

Using the Bing API to list search results is easier.

Bing TOS : not for seo rank checks

In the last paragraph of the api guide, Bing give a quick recap of their TOS, you can do max 7 queries per second, and using the results for SEO rank checks is explicitly prohibited.

These following snippets (text source) are hence explicitly not to be used for bing search engine result page (’serp’) rank checks.

bing api with simplexml

So here is one for web results using php simplexml. The web api (which uses namespaces) allows for retrieving max 1000 results per term at max 50 results per query, you can specify the number of results and the offset, where to start grabbing results.

  1. $Appid="A_VERY_LONG_STRING";
  2. $Query = "seo rank check";
  3. $Numres = 50; //max 50
  4. $Offset = 1;    //up to 1000
  5.  
  6. $url = 'http://api.search.live.net/xml.aspx?
  7. Appid='.$Appid.'
  8. &query='.$Query.'
  9. &sources=web
  10. &web.count='.$Numres.'
  11. &web.offset='.$Offset;
  12.  
  13. $feed = simplexml_load_file($url);
  14. //use the web: namespace
  15.  $children =  $feed->children('http://schemas.microsoft.com/LiveSearch/2008/04/XML/web');
  16.       foreach ($children->Web->Results->WebResult as $d) {
  17.                 echo $d->Title.'<br />';
  18.                 echo $d->Description.'<br />';
  19.                 echo $d->Url.'<br />';
  20.                 echo $d->DisplayUrl.'<br />';
  21.    }

..and one for the pictures using php simplexml :

  1. $Appid="A_VERY_LONG_STRING";
  2. $Query = "alkmaar";
  3. $Numres = 10;
  4. $Offset = 1;
  5.  
  6. $url = 'http://api.search.live.net/xml.aspx?';
  7. $url .= 'Appid='.$Appid;
  8. $url .= '&query='.$Query;
  9. $url .= '&sources=image';
  10. $url .= '&image.count='.$Numres;
  11. $url .= '&image.offset='.$Offset;
  12.  
  13. $feed = simplexml_load_file($url);
  14.  
  15. //use the mms: namespace      
  16.   $children =  $feed->children('http://schemas.microsoft.com/LiveSearch/2008/04/XML/multimedia');
  17.  
  18.     echo('<ul ID="resultList">');
  19.  
  20.     foreach ($children->Image->Results->ImageResult as $d) {
  21.                 echo('<li class="resultlistitem"><a href="' . $d->DisplayUrl . '">' . $d->Title . '</a><br />');
  22.                 echo('<img src="' . $d-/>Thumbnail->Url. '" /><br />
  23.                      '.$d->Thumbnail->ContentType.'<br />
  24.                     '.$d->Thumbnail->Height.'<br />
  25.                     '.$d->Thumbnail->Width.'<br />
  26.                     '.$d->Thumbnail->FileSize.'<br />
  27.                     </li>');
  28.        }
  29.     echo("</ul>");

I actually like that api, I am going to use that.

bing api with json

Bing seem to prefer you use json, less bandwidth usage. After their example in the api basics guide :

  1.  
  2. $Numres = 10;
  3. $Offset = 1;
  4. $Query='alkmaar';
  5.  
  6. $url = 'http://api.search.live.net/json.aspx?';
  7. $url .= 'Appid='.$Appid;
  8. $url .= '&query='.$Query;
  9. $url .= '&sources=image';
  10. $url .= '&image.count='.$Numres;
  11. $url .= '&image.offset='.$Offset;
  12.  
  13.  
  14. $response = file_get_contents($url);
  15. $jsonobj = json_decode($response);
  16. echo('<ul ID="resultList">');
  17. foreach($jsonobj->SearchResponse->Image->Results as $value)
  18. {
  19.     echo('<li class="resultlistitem"><a href="' . $value->Url . '">');
  20.     echo('<img src="' . $value-/>Thumbnail->Url. '"></a></li>');
  21. }
  22. echo("</ul>");

Of course there is the old RSS-option, which doesnt require an appid but also falls under the api 2.0 tos, and a soap option.

other sources :
There is a bing api php class made over at routecafe, and a jquery bing plugin using json over at Einar Otto Stangvik’s blog.

         
Comments
1 Comment »
Categories
php, seo
Tags
bing, php, seo
Comments rss Comments rss
Trackback Trackback

bing

juust | 21/08/2009

for completeness : php bing serp scraping :

  1. $query = 'serp';
  2. $page = 1;
  3. $start = ($page-1)*10;
  4. $url = 'http://www.bing.com/search?q='.urlencode($query)."&first=".($start+1);
  5.  
  6. $curl_handle = curl_init();
  7. curl_setopt($curl_handle,CURLOPT_URL, $url);
  8. curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
  9. curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
  10. $return = curl_exec($curl_handle);
  11. curl_close($curl_handle);
  12.  
  13. $parts = split('<h3>', $return);
  14.  
  15. for($j=1;$j<count ($parts);$j++)
  16. {
  17.     $p=$parts[$j];
  18.     preg_match('#<a\s+.*?href=[\'"]([^\'"]+)[\'"]\s*(?:title=[\'"]([^\'"]+)[\'"])?.*?>((?:(?!).)*)#i', $p, $urls);
  19.     echo "position: ".($start +$j)." url: ".$urls[1]." title: ".$urls[3].'<br />';
  20. }
  21. </count></h3>
         
Comments
1 Comment »
Categories
php, serp
Tags
php, serp
Comments rss Comments rss
Trackback Trackback

ga api sample : get pageviews

juust | 13/05/2009

I was going to put that online : how to get the pageviews out of the google analytics api, using simplexml and php. Google use three namespaces in the output file which make it less easy accessible, so here’s a quick sample of how to get your sites pageviews out of it :

  1. //ids           = site identifier (from the site data feed)
  2. //metrics     = what i want to see
  3. //start-date
  4. //end-date
  5.  
  6. $feedUri = "https://www.google.com/analytics/feeds/data?ids=ga:10516419&metrics=ga:pageviews&start-date=2009-04-01&end-date=2009-05-01";    
  7.  
  8.  $curl = curl_init();
  9.  curl_setopt($curl, CURLOPT_URL, $feedUri);
  10.  curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 3);
  11.  curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
  12.  
  13.        $headers[] = "Authorization: GoogleLogin auth=".$Authtoken;
  14.  
  15. //for authtoken : see previous post
  16.  curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
  17.  curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
  18.  curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
  19.  curl_setopt($curl, CURLOPT_VERBOSE, 1);
  20.  
  21. //get the string containing the xml file
  22.  $gA = curl_exec($curl);

the feed has three namespaces (atom, opensearch and dxp/analytics), a simple way is accessing the ENTRY tags (from the Atom namespace), in that tag is one DXP: line and that has the answer to the question.

<dxp:metric confidenceInterval=’0.0′ name=’ga:pageviews’ type=’integer’ value=’755′/>

  1. //load the string into a simple xml object
  2.  $feed = simplexml_load_string($gA);
  3.  
  4. //take the atom namespace
  5.  $children =  $feed->children('http://www.w3.org/2005/Atom');
  6.  
  7. //take the entry tags
  8.  $parts = $children->entry;
  9.  foreach ($parts as $entry) {
  10.  
  11.         //from the entry tag,
  12.         //access the dxp namespace
  13.   $dxp = (object) $entry->children('http://schemas.google.com/analytics/2009');
  14.  
  15.         //METRIC contains the answer to the question
  16.         //grab from the tag METRIC the attribute VALUE
  17.                 echo   (string) $dxp->metric->attributes()->value;
  18.  
  19.         }

Important is using the (string) typecast, normally simplexml returns a simplexml object, when you force a string type, it gives the actual metric ga:pageview value attribute as number.

         
Comments
No Comments »
Categories
google, php
Tags
analytics, api, ga, google, namespaces, php, simplexml
Comments rss Comments rss
Trackback Trackback

« Previous Entries

Recent Posts

  • p2p with wordpress xml-rpc
  • Tweets on Google’s frontpage
  • happy new year
  • metaWeblog.newPost posting to Wordpress from Word
  • IE is retarded

click me!
rss
Comments rss
Blog Directory
Web Developement Blogs - BlogCatalog Blog Directory
Listed in LS Blogs the Blog Directory and Blog Search Engine
Blog Flux Directory
joopita.com free web directory and search engine
design by jide
sitemap
8096 confirmed spam kills